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Preface 


THIS BOOK is written primarily for education students who will 
take only a one- or a two-semester course in statistics. It is intended 
to be a general introduction to statistical methods as applied in edu- 
cational measurement and research. It does not presuppose previous 
work in statistics, nor mathematical knowledge beyond the simplest 
algebra. The approach is mainly through common sense and 
arithmetic. 

The book includes elementary treatment of most of the techniques 
and measures currently used in measurement and research. All of 
the topics can be covered in a two-semester course; in a shorter 
course a great many will ordinarily need to be omitted. The material 
is organized so that parts of each chapter may be omitted with little 
loss of continuity. This makes it possible, even in a short course, to 
give some attention to the use of statistics both in the reduction of 
data and in inference. Many of the topics are largely self-teaching 
and may be included at the option of the student. 

Sampling notions are introduced early and are carried along in- 
formally until the final chapter, where they are treated logically. 
"This has several advantages. It provides a broad base for the theory 
of inference. It makes it possible to deal more naturally and compre- 
hensively with descriptive statistics. Early and sustained attention 
to sampling considerations serves to emphasize one of the important 
lessons to be learned from statistics, namely, that the conclusions we 
reach, whether in ordinary thinking or in research, usually are 
based upon incomplete evidence. 

The introduction to the logic of inference in Chapter VIII is pur- 
posely broad and somewhat abstract. It has been my experience 
that even those who will never become proficient in selecting and 
applying statistical tests gain a fair understanding of the purpose 
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and meaning of such tests if they are not burdened too soon with the 
mechanics. The tests are rationalized with reference to the two types 
of inferential errors and their control. To stop short of this is, I 
think, both to oversimplify the theory of inference and to make it 
mysterious. P 

For several reasons I have given a good deal of space to the normal 
sampling distribution and large sample methods. Standard errors, 
even though inexact, are needed to supplement descriptive statis- 
tics, so widely used in educational work. Large sample methods 
provide a relatively easy approach to the ideas underlying statistical 
inference and to exact sampling methods. Moreover, they at least 
partly atone for their lack of elegance by their great convenience 
and practical value. 

Few of the illustrative data are fictitious. The real data included 
in Appendix B are voluminous and varied enough for many pur- 
poses. A great many of the exercises are an integral part of the text. 
The ideas they bring out might have been included in the dis- 
cussion, but the extension of theory to them will not usually be 
found difficult. Few of the exerciess require extensive calculations. 

I had originally planned to include proofs and derivations of at 
least the pivotal relationships and formulas, but the project grew out 
of hand. A few simple algebraic proofs are retained in the text and a 
few others in Appendix A, and reference is made to various others 
which use essentially the same approach and notation. It is not 
necessary to follow these, but it would be a mistake, I think, to 
assume that none in the “typical” class has the interest or com- 
petence to profit from them. I have tried to offset the limitations 
of a specific arithmetic approach by sustained emphasis upon the 
assumptions and conditions under which statistical operations and 
quantities have clear meaning and upon what statistical statements 
do and do not assert. 

I am indebted to Professor Ronald A. Fisher of Cambridge, to 
Dr. Frank Yates of Rothamsted, and to Oliver and Boyd Ltd. of 
Edinburgh for permission to reprint Table III from their book, 
Statistical Tables for Biological, Agricultural, and Medical Students; 
to Professor Fisher and Oliver and Boyd, Ltd., for permission to 
adapt Table V. B. and to quote from their book, Statistical Methods 
for Research Workers. I am also indebted to Catherine Thompson, 
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to Maxine Merrington, and to E. S. Pearson, Editor of Biometrika, 
for Tables E and F, which are abridged versions of tables originally 
published in Biometrika, My indebtedness to other authors and 
publishers for permission to use various materials is acknowledged at 
appropriate places in later pages. 

I wish to thank Eric F. Gardner of Syracuse University, David V. 
Tiedeman of Harvard University, and Marshall J. Tyree of the 
Philadelphia Schools for reading parts of the manuscript with 
critical and judicious eye. Their interest and suggestions have been 
invaluable. I wish also to thank William B. Castetter of the Univer- 
sity of Pennsylvania for help in planning the book and for many 
valuable suggestions based upon his use of parts of the manuscript 
in mimeographed form. 

Merre W. Tate 
Philadelphia 
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Chapter I 


Introduction 


Statistics as a Tool in Educational Research 


The search for knowledge needed in solving problems is uni- 
versal and endless. Outside of textbooks, few problems admit of 
only one solution. Since different kinds of knowledge may be ob- 
tained, it is typically the case that several solutions can be found 
for a given problem. The question of how a particular solution is 
reached is an important one, for it relates to that most persistent 
and complex question, “How do we know?” 

Sources of Knowledge. Generally speaking, we attempt to ob- 
tain the knowledge needed in solving a particular problem from one 
or more of four sources: (1) aulhorily: expert testimony, opinions 
of specialists; (2) inerlia: habit, custom, tradition; (3) intuition: 
self-evident propositions, indisputable premises, obvious truths; 
and (4) evidence: matters of fact.* Although the fourth source has 
come to have a prestige denied the other three, each of the three is, 
at one time or another, of value. Appeals to custom are useful in 
deciding questions relating to social amenities; appeals to authority 
or to intuition may be invaluable in dealing with a novel problem or 
one about which little is known. There are a great many problems. 
in school and society for which little trustworthy evidence is avail- 
able; there are many others whose solutions, despite a wealth of 
facts, turn on imponderables. 

Consider a group of teachers who are attempting to decide 
whether to adopt some proposed change in curriculum, instructional 
methods, or school practice. The teachers may resort to authority 

* There is unfortunately no simple definition of fact. As used here, fac! refers 
io something known, or capable of verification, directly through experience. 
Facts are here considered to be the stuff of evidence. They are not true or false; 


they just are, and constitute the criteria which make statements true or false. 
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and solicit the opinions of specialists regarding the merit of the pro- 
posed change. Custom or tradition may operate in either of two 
ways: the teachers may decide to “let well enough alone" and to 
go on doing what has been done in the past, or they may survey 
comparable schools and decide upon the practice which is the most 
popular or customary. Asa third alternative, the teachers may make 
certain assumptions about the needs of students and of society and 
attempt to reach a decision by “if-then” argument, i.e., by logical 
deduction from the assumptions. If this is done, the decision will, 
of course, depend upon the assumptions which are made, 

As a fourth alternative, the teachers may gather evidence relating 
to the results of the proposed change—as these have been observed 
in other schools or are observed through experimentation—and 
make their decision in accordance with the evidence, with the ex- 
pectation that the same results will obtain in the future. It is quite 
likely that the teachers will find the evidence to be inconclusive in 
one or more respects, and that consequently the final decision will 
have to be reached by weighing various considerations. 

Most of our educational problems are enormously complex, and 
it would be wrong to suppose that given the facts they can be solved. 
But it would be equally wrong to believe that workable solutions 
can be found without facts or in opposition to facts. We ordinarily 
expect expert testimony, at least in temporal matters, to be based 
upon matters of fact, if such are available. Custom and tradition 
give way, although slowly, when contradicted by the facts. The 
“reasonable” assumptions and the “indisputable” premises of con- 
venient syllogisms are constantly subject to scrutiny in the light of 
facts. 

When evidence is available or can be obtained, it is the mark of 
wisdom to use it. As a matter of common observation, factually 
supported solutions tend to be more widely acceptable, convincing, 
and more successful than any other kind. Moreover, whatever the 
method by which a particular problem is solved, the merit of the 
solution ordinarily is judged by its observable consequences. 

Research and Statistics. The search for factual solutions to 
problems commonly is called research. More formally, research is 
the systematic collection, analysis, and interpretation of facts relat- 
ing to a specific problem. Statistics, as a tool in research, deals with 
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methods of collecting and interpreting numerical facts. A great 
many of the educational problems which invite research relate to 
the measurement of differences between individuals, the appraisal 
of the results of instruction, or the organization and administration 
of the school. Problems of the first two classes usually involve the 
construction of measuring instruments, such as aptitude and 
achievement tests, and correlational and experimental procedures; 
those of the third class, the collection of comparative facts relating 
to revenues and expenditures, enrollments, characteristics of teach- 
ers and students, or the results of different kinds of school practices. 
When the evidence collected by testing, experimentation, or record- 
keeping is stated as numerical facts, as it usually is, statistical 
methods are essential to analysis and interpretation. 

Some social scientists are suspicious of figures. They believe that 
presenting evidence relating to man and his affairs in the form of 
numerical facts necessarily devitalizes and distorts the phenomenon 
which is under investigation. This amounts to the belief that social 
problems must be dealt with mainly on authoritative, traditional, 
or intuitive grounds. Reliable evidence usually can be quantified, 
and numerical facts are the results of quantification. It does not 
follow, of course, that quantification makes evidence reliable. 

The answers to criticism of statistical method in research usually 
reduce to the simple and practical one that it is demonstrably the 
most successful tool we have in dealing with numerous problems. 
Tf the alternative were recourse to oracles or logical deduction from 
impeccable premises, we could dispense with statistics—and with 
other research tools, for that matter. But we can attack few of our 
problems armed only with a priori wisdom and logic. If we are to 
deal with many of our problems on reliable grounds, it is necessary 
to employ statistics. 

Let us note in passing that statistics is extremely broad in appli- 
cation, education being only one of the many fields which it serves. 
It is the tool par excellence for dealing quantitatively with phe- 
nomena in any field which are too complex for precisely controlled 
experimentation and too irregular for the rational treatment of 
mathematics. It is one of the basic tools of research in economics, 
psychology, and sociology, as well as in education. It is used widely 
in agriculture, anthropology, biology, and medicine and, to a 
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limited extent, in the more exact sciences. An understanding of 
statistical method is an aid to understanding the developments in 
many fields of study, and consequently makes an important con- 
tribution to the general education of the student. 

Meaning and Origin of Statistics. The word s/alistics has 
three common meanings. In its oldest sense, it referred to any sort 
of facts, numerical or otherwise, which reflected the “ conditions and 
prospects” of society or state. However, the meaning of statistics 
in this sense has been narrowed, so that today, when the word is 
used to mean facts characterizing society and the physical environ- 
ment, numerical facts exclusively are implied. This meaning is well 
illustrated in the following passage from Johnson (Ref. 3, р. 1):* 


Our entrance into and departure from this world are recorded as sta- 
tistical events. Birth and death, marriage and divorce, the school at- 
tendance of our children, the crops grown by farmers, the number of 
miles flown by commercial planes, the hours of our labor, the output of 
manufacturing plants, the acres of wood demanded for paper, the hours 
of sunshine, the inches of snowfall—all such events and activities are 
recorded somehow and somewhere. Myriads of such experiences and 
events affecting the daily lives of roundly two billion human beings 
lie behind the statistical data condensed in volumes, published and 
unpublished. 


During the latter part of the nineteenth century, the word sta- 
listics acquired а. second meaning. It came to refer to the theories 
and techniques involved in collecting, summarizing, and inter- 
preting numerical facts, as well as to the facts themselves, i.e., 
statistics came to mean method or methods of dealing with numerical 
facts. (Ordinarily, when used to imply methodology, statistics is 
singular and takes the singular verb.) Statistical method originated 
in the calculation of insurance rates on ships, in the study of the 
operation of chance in games and human affairs, and in the investi- 
gation of errors of observation in astronomy. Statistical method 
was applied to the social sciences by Quetelet (1796—1874) in Bel- 
gium and Galton (1822-1911) in England, both of whom saw in 
the method a quantitative and powerful tool for dealing with the 
mass data characterizing man and society. Primarily owing to the 
work and influence of Karl Pearson (1857-1936) and R. A. Fisher 


* References referred to by number are listed at the end of the chapter. 
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(1890-), English scientists and mathematicians, both theoretical 
and applied statistics expanded rapidly during the first third of the 
twentieth century. At present, as has been noted, the method of 
statistics is widely used in various fields of research. 

In recent years, a third meaning has been given to statistics. 
When numerical facts are reduced to summary figures, such as 
averages, ranges, and percentages, the derived figures are frequently 
referred to as statistics and a single one as a statistic. In this sense, 
the arithmetic mean of a set of numerical data is a statistic. 

The three meanings of statistics are brought out rather well in a 
student's jest, “It’s all perfectly clear; you compute statistics from 
statistics by statistics." The three uses of the term rarely cause real 
confusion, however, since the particular meaning is usually quite 
clear in context. 

The Necessity of Statistics in the Reduction of Data. As a 
tool in research, statistical method renders two invaluable services. 
The first is that of enabling us to classify, organize, and summarize 
numerical facts so that they can be more readily comprehended and 
interpreted. Consider Tables I and II, Appendix В. As listed, the 
information about the eighth-graders and college students is diffi- 
cult to interpret. The many questions, such as how different eighth- 
grade sections compare, how the average achievement of all grades 
compares with achievement in other cities, whether achievement in 
one subject field tends to be better than in others, how the college 
freshmen compare with previous classes and with freshmen in other 
colleges, and so forth, cannot be studied without first reducing the 
masses of data into more compact forms. The information must be 
classified and summarized before the mind can comprehend its 
salient features. In further illustration, suppose a weather bureau 
has faithfully observed hourly temperatures during the past ten 
years. The bureau would have 24 X 365 X 10 temperature read- 
ings, and unless some sort of reduction and summarization scheme 
were used, the very thoroughness of the observations would make 
tbem hopelessly complicated. 

In this connection, Fisher points out (Ref. 2, p. 6): 

... Any investigator who has carried out methodical and extensive 


observations will probably be familiar with the oppressive necessity of 
reducing his results to a more convenient bulk. No human mind is capable 
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of grasping in its entirety the meaning of any considerable quantity of 
numerical data. We want to be able to express all the relevant information 
contained in the mass by means of comparatively few numerical values. 
... In all cases, perhaps, it is possible to reduce to a simple numerical 
form the main issues which the investigator has in view, in so far as the 
data are competent to throw light on such issues. The number of inde- 
pendent facts supplied by the data is usually far greater than the number 
of facts sought, and in consequence much of the information supplied by 
any body of actual data is irrelevant. It is the object of the statistical 
processes employed in the reduction of data to exclude this irrelevant 
information, and to isolate the whole of the relevant information con- 
tained in the data,* 


It is generally the case that a mass of numerical data is useful 
to the extent to which it can be summarized in tables or graphs and 
simply described in terms of frequencies, averages, variabilities, and 
relationships. It may be said that the first use of statistics in research 
is to “distill” raw data. The student need only consider the United 
States Census to convince himself that this service is indispensable. 

The Necessity of Statistics in Inference. The second invalu- 
able service rendered by statistics is that of enabling us to draw 
conclusions, of a statable degree of exactness, about the probable 
nature of objects and events upon less than complete evidence. 
To quote from Fisher again (Ref. 2, p. 41): 


. . . From a limited experience, for example, of individuals of a species, 
or of the weather of a locality, we may obtain some idea of the infinite 
hypothetical population from which our sample is drawn, and so of the 
probable nature of future samples to which our conclusions are to be 
applied. If a second sample belies this expectation we infer that it is, 
in the language of statistics, drawn from a different population; that the 
treatment to which the second sample of organisms had been exposed 
did in fact make a material difference, or that the climate (or the methods 
of measuring it) had materially altered. 


Most of the knowledge we derive from the matters of fact relating 
to some issue is of probable rather than of certain nature, because 
only a limited number or sample of the facts is available. The con- 
clusion that all men are mortal, for example, is not based upon the 


* Reprinted from R. A. Fisher, Statistical Methods for Research Workers, pub- 
lished 1950 by Oliver and Boyd, Ltd., Edinburgh, by permission of the author 
and publishers. 

ў Ор. cit. 
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study of the life histories of all men. Men are living today and men 
are yet to be born. Rather, the conclusion is based upon the life 
histories of certain men who have been observed to be mortal and 
is really an inference we make from a sample. Similarly, knowledge 
based upon experiment that one method of teaching, say algebra, is 
superior to a second method is only an inference drawn from a 
sample. The experiment might be repeated over and over again, 
both at present and in the future. These are illustrations of situa- 
tions in which we cannot possibly examine all of the facts relevant 
to a particular conclusion, for the simple reason that they are by 
nature endless and cannot be made available. In such situations, 
the total number or population of pertinent facts is considered to 
constitute an infinile hypothetical population. Any investigation of 
infinite populations necessarily is limited to samples. 

Some statistical populations, though finile, are so vast as to be 
practically inaccessible. In determining, say, the average price of 
staple groceries to the consumer, it is practically impossible to ob- 
serve prices in all of the retail outlets even in a single city. In the 
yarious public opinion polls it would be practically impossible to 
poll all of the members in the population about which it is desired 
to draw conclusions. In these situations, samples as large as time 
and money permit are selected and studied to determine the prob- 
able character of the population. 

In other situations, sampling may be necessary because the draw- 
ing of inferences demands the destruction of the cases studied. In 
seed germination studies, in testing milk for butterfat content, in 
determining the durability of manufactured articles, and so on, the 
investigation processes make the cases unfit for further use. Here 
again, knowledge about the population must be derived from a 
sample. 

The possibility of making inferences about a population from the 
ple is fundamental in research work. Sta- 
tistics provides a rigorous method of judging the reliability of infer- 
ences drawn from a sample. The fundamentals of sampling theory 
are complex and mathematically. difficult, but a practical under- 
standing of sound sampling procedures demands little more than 
common sense. It is only common sense to recognize that reliable 
inferences can be drawn only from representative samples and that 


characteristics of a sami 
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the larger the sample the more reliable the inference it permits. If 
we desire to determine the mean height of Philadelphia women, for 
example, we would have more confidence in the information pro- 
vided by a sample selected at large than in one provided by the 
members of a women’s athletic club, and more confidence in a 
sample of 100 than in а sample of 10. 

In research, statistics is necessary both in summarizing a sample 
of numerical facts and in drawing inferences from the sample. Both 
procedures are fundamental in deriving knowledge from numerical 
facts. A distinction is sometimes made between summarizing or 
descriptive statistics and sampling statistics. Such a distinction is 
arbitrary and can be misleading. There are few research problems 
free from sampling considerations. Let us examine the elementary 
aspects of sampling theory more closely in their relation to the col- 
lection and use of evidence. 

Selecting a Representative Sample. An argument based on 
sample evidence runs something like this: (1) The individuals in 
the sample are representative of a population of individuals; (2) 
certain facts are observed to characterize the individuals in the 
sample; (3) therefore, probably and approximately, the observed 
facts characterize the individuals in the population. The argument 
is simple, but the conditions which make it convincing are difficult 
to meet. In a later chapter we shall give unambiguous meaning to 
the words probably and approximately as they are used in statistics. 
At this point let us consider several conditions of sampling. 

Sample evidence has no demonstrable generality unless we know 
the population which was sampled and how the sample was selected. 
A statistical population, ordinarily referred to as population or uni- 
verse, may consist of the attributes or performances of a specified 
group of persons; crop yields in a locality; elements of climate in a 
region for a stated period; characteristics of rural, village, or city 
schools of specified size and location; attributes of manufactured 
articles of a given kind coming off an assembly line; characteristics 
of houses, farms, animals, and the like in a certain locality, state, 
or nation; or any other set of specified objects or events which pos- 
sess a common characteristic in varying amounts or which have been 
selected according to a single principle. Statistical populations can 
always be expressed as numerical facts. Although this is putting 
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the cart before the horse, it may be said that all-of the individuals 
in a group to which a conclusion based upon sample evidence is to 
be applied must be thought of as members of the population from 
which the sample was selected. 

The exact specification or definition of the population involved 
in а research study is prerequisite to selecting a sample, both because 
it minimizes the danger of utilizing a nonrepresentative or biased 
sample and because it sets forth the logical limits of the inferences 
which are drawn from the sample evidence. The specification should 
set forth clearly just what objects or events are considered to con- 
stitute the population. 

After the population is specified, a “representative” sample is 
selected. The great and possibly insurmountable problem in sam- 
pling is that of determining whether a sample is adequately repre- 
sentative of the population. Several schemes for judging sample 
representativeness have been proposed, none of which is satisfac- 
tory. Most of the schemes are based upon increasing the size of the 
sample until the evidence is stable, i.e., unaffected by new cases. 
If the method of sampling is unbiased, such schemes may aid the 
researcher in judging whether his sample is adequate as to size; if 
the method is biased, however, so that an unrepresentative sample 
will result, adding new cases will merely demonstrate the consistency 
of the method. Such schemes assume representativeness, which is 
the very thing in doubt. 

Although there is no way of making sure that a sample is repre- 
sentative, both theoretical considerations and experience indicate 
that a sample selected at random, і.е., by chance, is the most trust- 
worthy. When the method of sampling assures every individual in the 
population the same chance of being drawn as any other individual, 
the sample is said {о be random. There are several methods of random 
sampling, all of which require that every individual in the popula- 
tion be listed. One of the most satisfactory and easiest to apply is 
based upon the use of random numbers, such as those included in 
Table Н, Appendix С. Let us demonstrate both the use of random 
numbers in sampling and the representative nature of a random 
sample by selecting a sample of 20 from the population of 400 


scores in Table III, Appendix B. 
We may begin by pointing at random to a number in Table H. 
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Let us suppose that we have pointed to the number 8 in row 35, 
column 7. From this point, or any point so selected, we may read 
up, down, sidewise, or diagonally, but once we have entered the 
table we should proceed in orderly fashion. Since the serial numbers 
of the 400 individuals in the population are three-digit numbers, 
we shall need to include three random digits at each reading. Let us 
agree to read upward from the starting point in row 35, column 7, 
and to include the digits in columns 7, 8, and 9. The first three-digit 
number we encounter for which there is a corresponding serial 
number in the population is 097, the second 380, the third 347, and 
so on. When we reach the top of the columns, let us cross over to 
columns 10, 11, and 12, and read downward until we have 20 random 
numbers. The numbers so selected and the corresponding individual 
scores in the population are: 


097 56 085 28 254 22 361 39 
380 44 379 38 051 45 186 48 
347 40 140 54 179 28 109 38 
067 45 262 41 223 28 049 50 
362 40 000 30 195 64 317 37 


There are various ways of using a set of random numbers in sam- 
pling. The important precaution to observe is that of following a 
systematic pattern of selecting digits after the table is entered, or 
of making as many random entries as there are numbers needed; 
in other words, to make sure that the randomness of the digits is 
permitted to preyail. 

Let us examine the representativeness of the sample we have ob- 
tained. The mean of our sample is about 41; the mean of the popula- 
tion is about 40. Thus, the sample provides a good approximation 
to the mean value of the population. The sample is too small to 
provide dependable estimates of other population characteristics, 
such as form of distribution and variability. (See exr. 5.) 

One of the commonest questions in sampling is that regarding 
adequate sample size. There is no simple answer to the question. 
Adequacy of size depends upon the sort of evidence being sought 
and the degree of reliability desired. The latter depends, to a large 
extent, upon the homogeneity of the population. The word homo- 
geneily in statistics refers to the degree of similarity characterizing 
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the individuals in the population in a given respect, e.g., similarity 
in respect to height or IQ. If the individuals were exactly alike in the 
given respect, the population would be perfectly homogeneous in 
that respect, and a sample of one individual would be adequate. 
Populations are never perfectly homogeneous, and the property of 
homogeneity is relative. It is quite obvious that the less homogeneous 
a population is, the larger a random sample needs to be in order to 
provide evidence of a given degree of reliability. In later chapters, 
a great deal of attention will be given to the question of adequacy 
of sample size, in light of the sort of evidence sought and the 
homogeneity of the population. 

In a real research problem, of course, we cannot examine repre- 
sentativeness and adequacy of size of the sample in light of facts 
about the population, since these are the very facts we are attempt- 
ing to determine from the sample. Extensive experience, however, 
has indicated that a random sample, particularly if it is large, pro- 
vides an approximate replica of the population. In a later chapter 
we shall see that a random sample, in addition to providing trust- 
worthy evidence regarding the population, provides a rational basis 
for estimating the amount by which a sample estimate may be in 
error. 

There are several kinds of random samples, one of the most useful 
of which is the stratified random sample. The nature of this sample is 
most easily made clear by an illustration. 

Suppose it is known or suspected that teacher opinion regarding 
“ merit salary schedules" іп a large city is related to years of teach- 
ing, and suppose it is possible to survey opinion only in a sample 
of 1,000 selected from the population consisting of 20,000 teachers 
in the city. The 20,000 teachers would first be classified or “ strati- 
fied” according to some breakdown of experience and then samples 
of sizes proportional to the numbers of teachers in the various 
strata would be selected at random. If the population were stratified 
by experience as indicated on page 12 the stratified random sample 
would consist of 5 random samples of sizes as shown. It will be noted 
that the specified experience groups are proportionately represented 
in the total sample. It is obvious that, if there is relation between 
experience and opinion regarding merit salary schedules, the total 
sample would be expected to yield a better estimate of opinion in 
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the population than a simple random sample. There would, of 
course, be various possible breakdowns of experience, such as less 
than 3 years, 3 to 5 years, 6 to 8 years, and so on, but the mechanics 
of selecting the sample would be the same. It is usually the case that 
possibly significant strata are suggested by preliminary investiga- 
tion, results of previous studies, or intuition. 


NUMBER OF TEACHERS NUMBER OF TEACHERS 

YEARS EXPERIENCE IN POPULATION IN SAMPLE 
4 ог less 2,500 125 
5 to 9 5,000 250 
10 to 14 6,000 300 
15 to 19 4,500 225 
20 and over 2,000 . 100 

TOTAL 20,000 1,000 


The possibility of distinct population strata with respect to some 
variable, such as age, sex, educational level, residence, or socio- 
economic class, should be kept in mind, both because random sam- 
pling within strata tends to provide a more representative sample 
and because the differences between strata may be informative in 
themselves. If the variable upon which the stratification is made is 
unrelated to the characteristic of the population under investiga- 
tion, the stratified random sample is, of course, no more likely to 
be representative of the population than the simple random sample; 
in fact, when this is the case, the former is no different from the 
latter. Populations may be stratified on more than one variable 
(See exr. 8). 

'There are several ways of sampling from stratified populations. 
In case more is known or suspected about the population than the 
numbers in strata, e.g., the relative variability in strata, the simple 
proportional sample discussed above is usually not the most trust- 
worthy. (See p. 445.) 

When sampling from a population in which the individuals can- 
not be listed, it is impossible to use random numbers or some other 
lottery method of selecting a sample. There is no altogether satis- 
factory way of selecting a sample in such situations. The best that 
the researcher can do, perhaps, is to make a determined effort to 
avoid bias and to report in detail the sampling method used and 
the composition of the sample which resulted. 
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Nonrandom Samples. Samples which result from other than 
random methods of selection are sometimes called accidental, inci- 
dental, or uncontrolled. They are here designated nonrandom, be- 
cause that term is the most expressive of their nature. 

A great many of the experimental studies dealing with instruc- 
tional methods involve such samples, as do the majority of the 
questionnaire investigations. In the former, the convenient intact 
group, perhaps the students in a certain class or room, is the sample 
used; in the latter, self-selected samples are established by the 
respondents who return questionnaires. 

It is observable that the majority of the samples used in edu- 
cational research are nonrandom; and it is likely, because of admin- 
istrative and other practical difficulties, that the practice cannot be 
avoided, at least in many instances. 

What can be said about the nonrandom sample? Since it cannot 
be considered to be representative of any known population, the 
information it yields, strictly speaking, does not permit generaliza- 
tion. Whether the findings hold for another group can be determined 
only by *try-and-see" procedure, which really means repeating the 
investigation. It has been argued that the researcher can generalize 
to some imagined population which would be fairly represented by 
his sample, but it is difficult to extract any real sense from the 
argument, 

However, it would be incorrect to conclude that the study of a 
nonrandom sample is without significance. The investigation may 
be worth while, both because the sample evidence may be important 
in itself and because the investigation may suggest significant prob- 
lems and hypotheses for more extended and general study. Further- 
more, there is always the possibility that a nonrandom sample is 
adequately representative of other groups, so that what has been 
observed will have some generality. Because of this possibility, a 
nonrandom sample should be described in detail with respect to 
the factors which may have influenced the findings. In applying the 
findings from a nonrandom sample to another group, we necessarily 
proceed by analogy, i.e., we reason from the particular to the par- 
ticular, Such reasoning is sound only when there are real similarities 
and no crucial differences between the particulars involved. Com- 
plete description of the nonrandom sample is needed in order to 
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compare it with other groups to which the findings may be applied 
and thereby to judge whether the conditions of sound analogy are 
met. 

We shall return to the question of judging the reliability and sig- 


nificance of sample evidence in a later chapter. Before we сап deal, 


with these fundamental issues, it is necessary to develop several sta- 
tistical concepts relating to frequency distributions and the meas- 
urement of their important properties. Аз he learns about these, the 
student should cultivate the habit of thinking about a given set of 
facts as a sample of a much larger body of facts which could be 
obtained if time and money were lavished. The student is urged also 
to remember that no formal rules can take the place of experience 
and common sense in the selection and interpretation of samples. 

McNemar (Ref. 4) gives a comprehensive semitechnical discussion 
of sampling in psychological research, and Stephan (Ref. 7) deals 
with several important practical problems that arise in sampling. 

Misuse of Statistics. Everyone has heard such gibes as “There 
are little liars, big liars, and statisticians,” “ You can prove any- 
thing provided you use statistics,” and “Statistics supports many 
mistaken things including statisticians.” It must be admitted that 
statistics in the hand of the neophyte is an uncertain tool, and in 
the hands of the propagandist, a dangerous one. Most exhibits of 
numerical data admit of more than one kind of analysis, and if a 
person sets out to analyze any given evidence according to some 
point he wishes to establish, he probably will be able to construct 
an argument plausible enough to fool at least the ignorant. 

Most of the arguments against the use of statistics in social re- 
search can be reduced to the single one that statistics can be and 
frequently is misused. Throughout the following chapters consid- 
erable attention will be given to the proper use of statistics; at this 
time, it is desirable only to preview and to illustrate briefly several 
of the more flagrant misuses. 

A great many of the misuses of statistics arise through the fol- 
lowing practices: 


a. Using an average value to represent a set of numerical data when 
the average obscures important features of the data. For ех- 
ample, if the yearly salaries of three workers are $10,000, $1,000, 
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and $1,000, respectively, the arithmetic mean, $4,000, would 
obscure more information than it would convey. If the mean were 
the only information reported, the data would in effect be falsi- 
fied. (See Chapter Ш.) 


. Ascribing the characteristics of a set of numerical data to a par- 


ticular case, In any empirical study, a great deal of caution must 
be exerted in the interpretation and application of the findings. 
In the example above, the average man cannot be said to earn 
$4,000 in a real sense. To illustrate further, the generalization 
that students with higher IQ's make better scholastic records 
cannot be applied to an individual student without careful 
qualification and appreciable uncertainty. The generalization is 
reliable and important, but a great many other variables, some 
not yet measurable, affect a particular student's achievement. 
Making unwarranted inferences from a sample—overgeneraliz- 
ing. The inferences about a population drawn from sample еуі- 
dence are subject to sampling error. The evidence provided by 
even a large representative sample must be interpreted in light 
of possible error. The information provided by a small or non- 
random sample may provide no useful generalization at all. The 
most dramatic example of making unwarranted inferences is 
that of the well-known Literary Digest poll of 1936. Unfortunately, 
few unwarranted inferences are so quickly detected or so soundly 
ridiculed. 

Making comparisons without reference to all of the pertinent 
data or making comparisons without a clearly defined base. To 
say that city teachers are better paid than rural teachers has 
little meaning unless cost of living is taken into account. Sensible 
interpretations of wages cannot be made independently of prices. 
In further illustration, the economic value of a college education 
cannot be established by comparing salaries of college men with 
salaries of noncollege men. Other circumstances, such as original 
wealth, social status, and intelligence, vitiate such comparisons. 
Inferring causation because of association. Statistical method 
cannot directly demonstrate a causal relationship between two 
variables: it can only provide a measure of the amount of asso- 
ciation. Whether two associated variables, say education and 


income, are related as cause and effect is a question which de- 
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mands information beyond the fact of association. We shall 
return to this important topic in connection with correlation and 
regression. (See Chapter VI.) 

f. Carrying out computations and reporting statistics to a degree 
of exactness which the original observations do not warrant. 
Statistical manipulation of figures does not increase their ac- 
curacy. We shall consider this matter later in the present chapter. 


Any thoughtful student can detect such misuses and fallacies as 
those listed above; however, not all misuses and misinterpretations 
of statistics are so easy to detect. In the best of hands, statistical 
analysis of numerical data, like any other attempt to gain reliable 
knowledge, is not infallible; even competent statisticians may dis- 
agree about the treatment and meaning of some data. The method 
of statistics constitutes a necessary but not sufficient, condition of 
drawing sound conclusions from mass numerical data. To ascribe 
to any numerical fact, such as an average IQ or price index, one 
and only one possible interpretation would be sheer numerology. No 
research tool can take the place of a mind in its user. As we study the 
more technical aspects of statistics, however, we shall find that sta- 
tistics not only aids in collecting and analyzing evidence, but also 
is the most effective tool we have in determining whether evidence 
is fairly collected and properly analyzed. Moreover, since they 
explicitly demand unbiased and reliable evidence, statistical meth- 
ods tend to be self-correcting. 

Tt is: one of the aims of a course in statistics to bring about a 
thoughtfully critical attitude regarding conclusions purportedly 
based upon facts. Generally speaking, such an attitude coupled with 
common sense and an understanding of elementary statistical 
method enables a person to detect misuses of statistics and to reason 
well about statistical data. It has been said that, to a large extent, 
the correct use of statistics depends upon common sense and simple 
arithmetic. 


Exercises 


1. A group of school officials had to decide whether to permit the forma- 
tion of student fraternities. Discuss several methods by which a deci- 
sion might be reached. 
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. By use of randon 


„ “А sample may be large yet wor 
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Below are several of the perennial problems of education. Select one 
and discuss several ways of attempting to find a solution. 


a. To what extent should the federal government aid in the financial 
support of education? 

b. Should professional graduate schools of medicine, engineering, law, 
and so on require wide undergraduate study of the liberal arts? 

c. Should promotion from grade to grade in the elementary school be 
based upon standards of achievement? 

d. Should “teaching efficiency” be considered in salary schedules? 


Suggest an educational problem which you believe can satisfactorily 
be dealt with (a) authoritatively, (b) traditionally, (c) intuitively, (d) 
factually. What are the advantages and limitations of each method? 
What are some populations the following samples might be from: (a) 
48 readings of a thermometer, (b) a quart of crude oil, (c) а yard of 
cloth, (d) all of the students in a high school, (e) the preserved letters 


of Samuel Johnson? 
m numbers, select 5 samples of 10 scores each from the 


population of 400 scores in Table Ш, Appendix B. Tally the scores in 
the classes shown below, then combine the 5 samples to form a sample 
of 50. Is the sample of 50 more likely to be representative of the popu- 
lation than any one of the samples of 10? Explain in terms of the opera- 


tion of chance. 


SAMPLE 
4 5 Combined 


тота, 10 10 10 10 10 50 


thless, because it is not random, or 
it may be random but. unreliable, because it is small." Explain. 

A single drop of blood from the finger tip can be relied upon to give a 
fair blood count for an individual, but a random sample of, perhaps, 
100 ten-year-olds cannot be relied upon to give a fair estimate of the 
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10. 


11. 


12. 


13. 


. Do these statements mean the same thing: 
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proportion of Philadel phia ten-year-olds in various mental age brackets. 
Explain. 


. Returning to the illustration of the stratified random sample, suppose 


it is known or suspected that opinion about merit salary schedules is 
related to sex as well as to experience. If the ratio of men to women 
teachers in each experience group is about 3 to 5, how would the strati- 
fied random sample be selected? 

“The burden of proof of a 
representative sample is always on the investigator" and “When an 
investigator generalizes his results from a sample to a population, it is 
his responsibility to indicate the logic and basis of the generalization "? 
A classroom teacher tried out a new method of teaching on his class, 
and decided to use the new method thereafter. What is the sample? 
The population? 

A quality control statistician examined 100 pencils coming from the 
production line and tested them for uniformity of lead. Under what 
conditions can he generalize to all of the pencils coming from the line? 
A social science teacher had 120 students in 5 classes. He gave them a 
questionnaire on social attitudes, then invited them to accompany him 
on a tour of the slums, Thirty-five accepted the invitation, After the 
visit he gave a second questionnaire on social attitudes and found that. 
the attitudes of the 35 were significantly changed. He concluded that 
visiting the slums changed social attitudes. Comment. 

Criticize the following, stating one or more fallacies in each. 


a. А charitable organization received 5 donations during the week 
consisting of $25,000, $10, $5, $5, and $1 and announced that the 
average donation was a little over $5,000. 

b. In a production line experiment, involving 5 workers and 2 methods 
of assembling an article, the length of time in minutes required by 
the workers under Method I was 20, 18, 18, 17, and 12; under 
method II, 42, 30, 9, 7, and 7. It was concluded that Method I 
was superior. 

с. A researcher sent questionnaires to a random sample of 50, of which 
31 were returned. Of the 31, 26 favored a proposal. The researcher 
reported that 83.87 per cent of the sample favored the proposal and 
that consequently it was safe to conclude that at least 75 per cent 
in the population favored the proposal. 

d. News headline: “ Education Boosts Farmers’ Income.” Higher educa- 
tion is a good investment for farming, John Doe, extension economist 
at Blank University, reported yesterday. Doe said in а statement 
that a survey from 1907 to 1936 showed that farmers with a college 
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education earned five times as much as those who attended only 
grade school. “Young farmers today," he commented, “need to be 
as well prepared for their business as do doctors and lawyers.” 

14. Sir Francis Galton observed, * General impressions are never to be 
trusted. Unfortunately when they are of long standing they become 
fixed rules of life, and assume a prescriptive right not to be ques- 
tioned. Consequently those who are not accustomed to original 
inquiry entertain a hatred and horror of statistics. They cannot 
endure the idea of submitting their sacred impressions to cold- 
blooded verification." Mention several “general impressions" cap- 
able of being studied statistically. 


Statistical Data 


The beginning student is likely to have some difficulty in his 
reading and communicating, due to the lack of a standard. vocabu- 
lary in statistics. In a statistical study, the variables that are being 
investigated and the numerical facts relating to them are designated 
by various names. In the interest of ease and clarity of communica- 
tion, it is desirable to define here several of the more common sta- 
tistical terms and to illustrate their uses. 

Data. The term data (plural of datum, meaning fact) is an all- 
embracing term used to designate the evidence or facts which de- 
scribe a group or a situation and from which inferences or conclu- 
sions are drawn. Numerical facts such as heights, weights, scores 
on educational tests, prices of goods, crop yields, salaries, school 
enrollments, numbers of people voting for presidential candidates, 
and so on conveniently may be referred to as statistical dala or just 
dala. The broad meaning of dala makes the term extremely useful, 
provided, of course, its referents are clear. 

Phenomenon; Variable; Variate. Statistics is often referred 
to as the study of phenomena which are characterized primarily by 
variation. In the most general sense, statistical data consist of the 
numerical observations made of varying phenomena. The term 
phenomenon refers to some aspect of the environment, such as 
weather, a human trait or activity, or an economic or social cir- 
cumstance, which can be observed or measured. In statistics and 
in science, phenomenon does not imply something extraordinary or 
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prodigious, as it frequently does in common usage; it refers simply 
to an object or event capable of being perceived. 

The term variable refers to a phenomenon which shows variation 
from place to place, from ,time to time, or from object to object. 
Any phenomenon which can be investigated statistically is a 
variable. 

The student will find that some writers use the term variate inter- 
changeably with variable, while others use it to refer to a particular 
value of a variable, e.g., a single IQ among the several which result. 
‘in measuring intelligence in a group. Thus, the plural variates may 
mean different variables or may mean the different values of a given 
variable. Although the meaning of the word ordinarily is clear in 
context, we will avoid its use hereafter. 

"There are two broad classes of variables, (1) those which vary in 
amount and (2) those which vary in kind or quality. The observa- 
tions of a variable of the first class are fundamentally different from 
the observations of a variable of the second class, as will be brought. 
out below. 

Statistical Series. A set of numerical facts or items originating 
in the observation of a single variable commonly is called a sta- 
lislical series or just a series. The term is a fortunate one, for it sums 
up the salient features of а set of facts which can be dealt with 
statistically, namely, a common characteristic and variation in size 
or kind. 

When the items in a series vary in size, they are capable of being 
classified in order of size and therefore constitute’ quantitative series. 
A quantitative series is continuous if the values of its items differ 
by amounts which are indefinitely small, or if they theoretically 
would so differ if measured more precisely. Examples of continuous. 
series are chronological ages or test scores of a group of individuals, 
temperatures, crop yields, and inches of rainfall. The items in such 
a series may differ theoretically by indefinitely small amounts. 
Continuous statistical series are always quantitative, but not all 
quantitative series are continuous. 

When the items in a series must be expressed in whole numbers, 
the series is discontinuous or discrete. Discrete series usually are 
made up of items whose values have been determined by counting. # 
Class sizes, school enrollments, census data, and the number of 


. 
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heads observed in repeated tosses of a set of coins are examples of 
discrete series. We shall see later that it is frequently useful to treat 
discrete series as continuous and vice versa. 

The items of a statistical series originating in the observations of 


` a phenomenon which varies in kind or quality cannot be arranged 


from least to most under a single classificatory principle. For ex- 
ample, the numbers of people belonging to various religious de- 
nominations in a city cannot be grouped from least to most under 
a single heading. The variable is “church membership,” and the 
variation of the items is that of kind, namely, Baptist, Catholic, 
Episcopalian, and so on. Such a series is called qualitative. Other 
examples of qualitative series are those relating to occupation, race, 
sex, hair and eye coloration, political affiliation, and enrollments in 
different kinds of schools and colleges. 

In summary, the observations of a variable may yield either a 
quantitative or a qualitative series, depending upon whether the 
variation i$ in amount or in kind. If the variation is in amount, the 
variable is considered to be quantitative; if in kind, qualitative. 
In statistics, qualitative variables frequently are referred to as 
allribules. As will be shown later, the statistical methods of studying 
attributes are necessarily different from those of studying quantita- 
tive variables, although the methods connect at several points. 

Statistical Items. The items of a statistical series are variously 
designated. There is no one best or commonest name for them. The 
terms observalions, scores, values, and measures are used synony- 
mously to refer to the items of a quantitative series. When the series 
is continuous, the range over which the items spread is commonly 
referred to as the scale of values or the scale of scores. 

As has been emphasized, statistics deals with numerical facts. 
The items in a qualitative series are not capable of statistical treat- 
ment until they are entered as numbers under the categories char- 
acterizing the qualitative variable. When entered as a tally in a 
category, an item becomes an instance or a case of a quality of the 
variable. All of the cases in a category constitute the frequency in 


that category. Thus, frequencies in calegories are the numerical facts 


characterizing a qualitative variable. 
Proportion; Percentage. Consider the qualitative series shown in 
"Table 1.1. There are 5 occupational categories, and the frequency in 
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each category is shown. The total frequency (equal, of course, to 
the number of cases in the series) is 26. 

In comparing the frequency in a particular category with the 
total frequency, it is customary to take the ratio frequency divided 
by tolal frequency. The ratio of a frequency in a category to the total 
frequency, i.e., the relative frequency, commonly is called а pro- 
portion and is symbolized by p. Thus the proportion of skilled occu- 
pations in Table 1.1 is 8/26 or .3077. 


TABLE 1.1 
OCCUPATIONS OF FATHERS OF 
26 EIGHTH-GRADE PUPILS 


OCCUPATION FREQUENCY 


Professional 
Business 

Skilled Labor 
Semiskilled Labor 
Unskilled Labor 


HSanar 


TOTAL 26 


When a proportion is multiplied by 100 the product is a percentage 
symbolized as Р, Proportions may always be transformed into per- 
centages without loss of meaning, but the converse is not true. Per- 
centages frequently are used to mean something other than a frac- 
tional part of a whole, a meaning which proportions, by definition, 
cannot have, 


Exercises 


15. By reference to the data of Table I or П, Appendix В, illustrate each 
of the following: attribute, category, continuous series, discrete series, 
qualitative series, quantitative series, variable. 

16. Give original illustrations of the terms in exr. 15, above. 

17. In what sense can scores in a psychological or educational test, ordi- 
narily stated in whole numbers, be considered continuous? 

18. In each of the following, state whether the series is continuous or 
discrete: 
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a. Mental ages of delinquents, 

b. Deposits during a day in a given bank, 

c. Number of times 12, 11, . . . , 2 spots appear in rolling a pair of 
dice 50 times, 


d. Sprinting times of participants in а 100-yard dash. 
е. Scores in an intelligence test, 


19. Give an illustration where college enrollments would constitute (a) а 
discrete quantitative series; (b) a qualitative series. 

20. What sort of statistical series would the responses to each of the fol- 
lowing questionnaire items constitute: 


а, What is your age to the nearest birthday? 

b. What is your height? Your weight? 

с, What is your occupation? 

d. Are you married? 

е. What is your income? 

f. Have you ever had a major operation? If so, for what? 


21. What are the proportions or relative frequencies in the various occu- 
pational categories of Table 1.12 What is the sum of the relative 
frequencies? 


Exact and Approximate Numbers and Computations 


Іп general, statistical data are obtained by counting the number 
of objects or events in a specified group or by taking measurements 
of a continuous variable, As we have seen, numbers obtained by 
the first method form either a qualitative or a discrete quantitative 
series, depending upon whether the phenomenon under observation 
varies in kind or in size, while numbers obtained by the second 
method form a continuous series, at least in theory. 

The rules and interpretations governing numbers obtained by 
counting are somewhat different from those governing numbers ob- 
tained by measuring, as will be brought out in the following pages. 

Exact Numbers. In contrast to the necessarily approximate na- 
ture of measurements, numbers obtained by counting objects are 
exact, provided of course no mistake is made in the count, Prac- 
tically, these are the only exact numbers possible. The theoretically 
exact quantities, such as т and the base of natural logarithms е, are 
not capable of being reduced to exact numerical values. That count- 
ing may be tedious or errors made in the count is beside the point, 
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it is possible to count the objects in a given group exactly. Numbers 
oblained by counting are considered to be exact. 

Computing with Exact Numbers. The results of a computa- 
tion involving only exact numbers is theoretically exact and may be 
carried out to as many decimal places as desired. If a school enrolling 
650 pupils employs 30 teachers, the pupil-teacher ratio is 21.666 
. . . to as many decimal places as desired. If, in a group of 90 indi- 
viduals, 40 have brown eyes, the proportion of brown-eyed indi- 
viduals in the group of 90 is 0.4444 . . . , and the percentage of 
brown-eyed individuals is 44.44 . . . . If 25 pupils each contribute 
15 cents to a flower fund, the amount raised will be exactly 25 Х 
$0.15 or $3.75. When such computations are performed without 
error, the result is either exact or can be made to approach exactness 
as closely as we please. 

Approximate Numbers. Ап approximate number is an estimate 
of an exact or “true” value. А theoretically exact number becomes 
an approximate number when it is rounded off. Thus, the exact 
pupil-teacher ratio 650/30, above, may be written as an approxima- 
tion as 20, 22, 21.7, 21.67, and so on to any degree of exactness. 
The decimal expressions of the mathematical processes which can- 
not be performed exactly, such as the processes indicated by 
650/30, 40/90, 4/2, and т, are necessarily approximations of the 
exact or true values. The numbers in tables of logarithms, square 
roots, compound interest functions—in fact, the numbers in nearly 
all mathematical tables—are approximate, because they have been 
rounded off to a convenient number of digits. For example, in a 
four-place table, the square root of 865 is entered as 29.41. Such an 
approximation tells us that the square root of 865 is nearer to 29.41 
than to either 29.40 or 29.42, or that the exact value lies between 
29.405 and 29.415. Thus, one source of approximate numbers is the 
rounding of decimal expressions of exact numbers. 

The second source of approximate numbers is measurement by 
instrument, whether the instrument be an educational test, attitude 
scale, a human pace, thermometer, laboratory balance, yardstick, 
or vernier caliper. As has been noted, measurement always results 
in a number which is taken to represent some property of an object 
or event. Experience and common sense convince us that we can 
never be certain regarding this number. Under even ideal conditions, 
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the number can be no more exact than the finest division on the 
measuring instrument. Beyond that division, it can be only an 
estimate. Hence, any measurement must be interpreted as an 
approximation to a true dimension. A true dimension of a thing is 
а theoretical value; it may be thought of as the average of all the 
measurements which could be made. A single measurement, then, 
necessarily is an approximate number. 

Precision and Accuracy of Approximate Numbers. Precision 
and accuracy of measurement are involved topics; we will only 
touch on them here. For more complete treatment, the student is 
referred to Refs. 1, 5, 8, 9. The selection of a unit, both as to kind 
and size, for measuring a property of an object or event is an arbi- 
trary matter, although the kind of unit will ordinarily be suggested 
by usage, convenience, or the purpose of the measurement, and the 
size of unit suggested by the degree of exactness desired or dictated 
by the instrument available. 

The size of the unit used in making a measurement affects the 
precision of the measurement. Obviously, it is possible to measure 
length more precisely with a ruler graduated in tenth-inches than 
with-one graduated no finer than inches. Our decimal system pro- 
vides a convenient method of indicating precision. И we wish to 
report a measurement of, say, 115 tenth-inches, we simply write 
11.5 inches, and thereby avoid the awkward expression “ tenth- 
inches." Similarly, we may report 95.45 pounds instead of 9,545 
hundredth-pounds, and 186,000 miles instead of 186 thousand- 
miles. The student should not be confused by the fact that a meas- 
urement may not have been made in the units in which it is ex- 
pressed. It is doubtful whether any measurement has ever been 
made using hundredth-pound or thousand-mile units. Such units 
arise in the process of combining and rounding numbers in indirect 
measurements and in the operations involved in applying and read- 
ing scales. They mean simply that the measurement may be con- 
sidered exact to the nearest hundredth-pound or thousand-miles 
and no further. Thus, precision is judged by the size of the unit used 
in reporting a measurement, or any approximate number, for that 
matter. 

It is generally agreed to consider an approximate number accurate 
to the nearest unit in which it is reported. Thus, the final digit in 
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an approximate number is never considered to be exact, but only 
an indication of the lower and upper limits of the number. The digit 
4 in the reported measurement 124 months, for example, indicates 
that the measurement lies between 123.5 and 124.5, being closer 
to 124 than any other three-digit number. The digit 4 is thus a 
meaningful digit, although it cannot be considered as exact. It is 
therefore regarded as uncertain or in doubt. This convention is 
similar to that used in interpreting the limits of a rounded" exact 
number, previously noted. 

The digits which indicate the number of units in an approximate 
number are said to be significant digits, the final digit, as pointed 
out above, being considered uncertain. All of the digits from 1 to 
9, inclusive, are significant. Zeros are significant when they convey 
information about the number of units in the number; they are not 
significant when they indicate only the size of the unit used, i.e., 
the precision of the number. 

In some instances, we can be sure zeros are significant. When a 
zero appears between two significant digits, it is, of course, signifi- 
cant, e.g., the zeros in 1,005 are significant. Zeros at the end of a 
decimal number are always significant. The zero in 12.0, for example, 
is significant, for it indicates the limits 11.95-12.05 and asserts the 
number is closer to 12.0 than any other three-digit number. 

In some instances we сап be certain that the zeros in approximate 
numbers are not significant. In decimal fractions, the zeros between 
the decimal points and the first nonzero digit are never significant 
for they only indicate the size of the unit. In the measurement .005 
pounds, for example, the zeros indicate that the unit is thousandth- 
pound. Only the digit 5 is significant. 

In interpreting reports involving integral or whole approximate 
numbers, we can never be sure regarding the significance of zeros. 
The measurement, 186,000 miles, might mean 186 thousand- 
miles, 1,860 hundred-miles, 18,600 ten-miles, or 186,000 single 
miles, and hence might indicate the limits 185,500—186,500, 185,950— 
186,050, 185,995-196,005, ог 185,999.5-186,000.5. In reporting such 
numbers, if clarity is desired, a bar may be placed over the last 
significant digit or the numbers may be written as products of their 
significant digits and powers of ten. 'Thus, if 186,000 miles can be 
considered to be exact only to the nearest thousand miles, that 
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information should be conveyed by writing either 186,000 or, better, 
186 x 10°. 

The accuracy of an approximate number is judged in terms of the 
percentage of error it contains. The percentage of error may always 
be estimated from the number itself. Since reporting an approximate 
number such as 89 amounts to the assertion that the number lies 
between 88.5 and 89.5, the maximum error admitted is .5, and the 
percentage of error is 100(.5/89) or .6 per cent. In further illustra- * 
tion, the reported approximate number 15.5 is asserted to lie be- 
tween 15.45 and 15.55 and to contain an error no greater than .05; 
consequently, the percentage of error is 100(.05/15.5) or .3 per cent. 
Thus, the number 89 has less accuracy than the number 15.5. 

In general, the percentage of error of an approximate number 18 
determined by dividing the maximum error by the number and 
multiplying by 100. The percentage of error as defined is a function 
of the significant digits. Thus, the accuracy of an approximate 
number is related to the number of significant digits it contains. 
It ordinarily is sufficient to describe the accuracy of approximate 
numbers as one-figure, two-figure, and so on, rather than in terms 
of percentage error. The numbers .005, 89, and 15.5 have one- 
figure, two-figure, and three-figure accuracy, respectively. 

The discerning student will note that we have neglected to deal 
here with the error of measurement resulting from unreliability 
of the measuring instrument, which, at least in educational and 
psychological measurement, ordinarily is much greater than the 
error inherent in an approximate number itself. This sort of error 
will be treated later. At present, let us re-emphasize that, under 
even ideal conditions, we cannot claim 100 per cent accuracy for 
an approximate number. 

Computing with Approximate Numbers. Тп computing with 
approximate numbers, it generally is necessary to round off results, 
and we shall consider this topic first. A number is rounded off by 
dropping one or more digits at the right. The conventions for round- 
ing numbers in statistics are: 


a. In rounding whole numbers, the dropped digits are replaced by 
zeros, but in rounding decimal numbers dropped digits are not 
replaced. Example: Rounded to one-figure accuracy, 19 becomes 
20, but .019 becomes .02. 
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b. If the value of the digit(s) to be dropped is less than 5, no change 
is made in the preceding digit. Example: Rounded to two-figure 
accuracy, 1,549 becomes 1,500, and .3333 becomes .33. 

c. If the value of the digit(s) to be dropped is greater than 5, the 
preceding digit is increased by 1. Example: Rounded to four- 
figure accuracy, 66.666 becomes 66.67, and 5,112.51 becomes 
519: 

“а. If the value of the digit(s) to be dropped is exactly 5, no change 
is made in the preceding digit if it is even, but if it is odd it is 
increased by 1. Example: Rounded to three-figure accuracy, 
25.650 becomes 25.6, but 25.75 becomes 25.8. 


The principles governing approximate computation are too in- 
volved to be developed here. We shall attempt only to describe the 
commonly followed procedures and to show their reasonableness. 

In adding or subtracting approximate numbers, in general, the 
answer will be no more precise than the least precise number in the 
process. Consider the measurements 4, 4.26, 16.4, and 16.323, the 
uncertain digits in each being indicated by a bar. The four approxi- 
mate numbers and their lower and upper limits are summed below. 


APPROXIMATE NUMBERS LOWER LIMITS UPPER LIMITS 
4 _ 3.5 4.5 
4.2 4.255 4.265 
16.4 _ 16.35 16.45 
16.323 16.3225 16.3235 
40.983 40.4275 41.5385 


Tf the sum of the four is reported as 40.983, five-figure accuracy 
is claimed for a number containing four uncertain digits. Rounded 
off the sum becomes successively 40.98, 41.0, and 41. The rounded 
sum 41 contains only one uncertain digit and is well within the sums 
of the lower and upper limits of the measurements. We might have 
saved work by rounding each number so that none contained more 
than one digit to the right of the terminal digit of the least precise 
number. If this is done, we get the sum 40.9 and round to 41, which 
is as accurate a result as we can obtain from the given data. Let 
us note, parenthetically, that measurements to be added or sub- 
tracted should be made to the same degree of precision, and “ragged 
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decimals” thereby avoided. The procedure to follow in adding or 
subtracting approximate numbers is: 

Round off the numbers so that none һауе more than one digit to the 
right of the terminal digit in the least precise number, and after the 
addition or subtraction round the sum or difference to the same precision 
as the least precise number. 


In multiplying or dividing approximate numbers, the answers will 
be no more accurate than the least accurate number in the process. 
Consider the approximate numbers and their lower and upper 
limits, as shown below. 


APPROXIMATE NUMBERS LOWER LIMITS UPPER LIMITS 
28.7 28.65 28.75 
2 21.5 22.5 
14325 14375 
2865 5750 
Б 5150 
646.875 


The product 631.4 contains three uncertain digits. It should there- 
fore be rounded to 630, which is about halfway between the products 
of the sets of limits of the given numbers and which is consistent 
with the accuracy of the given numbers. 

Similarly, it may be shown that division of approximate numbers 
will yield results of no greater accuracy than the least accurate 
number involved. The procedure to follow in multiplying or dividing 
approximate numbers is: 

Round off the more accurate factor(s), divisor or dividend, until it 
contains only one more significant, digit than the least accurate. After 
the multiplication or division is performed, round off the product or 
quotient to the same number of significant digits as the least accurate 
number involved. 


То illustrate the procedure, let us multiply 48.25 by 64, both 
being approximate numbers. We round 48.25 to 48.2, multiply 48.2 
by 64, and round the product 3,084.8 to 3,100, since the least ac- 
curate factor 64 has two-figure accuracy. 

It is generally advisable to carry out division one more place than 
is to be retained, in order to be certain of the effect of rounding. 
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It is left as an exercise for the student to show that a power or root 
of an approximate number is no more accurate than the number. 
(See exr. 29.) 

Occasionally the rule for multiplying or dividing approximate 
numbers results in unreasonable error in the answer. Consider the 
product of 11.2 and 99, which, by the rule, would be written 1,100. 
The percentage of error of the least accurate factor is .5 per cent, 
that of the product 4.5 per cent. When the first significant digit in 
the least accurate number involved is substantially greater than 
the first significant digit in the product or quotient, the retention 
of one more digit than the rule suggests may be justified. 

In computations involving approximate and exact numbers, 
only the approximate number affects the accuracy of the results. 
To illustrate, if the approximate number 75.67 is multiplied by the 
exact number 65, the product has four-figure accuracy and is 
rounded to 4,919. If the approximate number 6,620 is divided by 
the exact number 80, the quotient has three-figure accuracy and 
is rounded to 82.8. 

There are no hard and fast rules for computing with approximate 
numbers. It is always necessary to use judgment. We generally 
wish to obtain results as accurate as possible, but we should not 
give a false impression of accuracy by carrying out answers further 
than the original numbers warrant. A good general precaution to 
observe is: 


The results of computations with approximate numbers must be ac- 
curate as far as they go, and they should go far enough to express the 
needed degree of accuracy. Hence, no more figures should be written 
than are known to be correct, and no figures which are known to be 
correct should be omitted inadvertently. 


When a series of computations is to be performed, it is usually 
a good plan to carry along two or three extra figures and then 
to cut the final results back to the number of significant figures 
warranted by the original data. 


Exercises 


22. All measurement is characterized by an approximation error. What are 
several other sources of error? 
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23. Refinements are constantly being made in industrial and scientific 
measurement. Do you believe the time will ever come when we can be 
absolutely certain of a measurement? Why? 

24. How precise ordinarily are measurements of rainfall? Weights of 
people? Test scores? 

25. Comment on the statement, “It really is just as bad technique to 
make a measurement more accurately than is necessary as it is to make 
it not accurately enough.” 

26. Interpret the measurements given in the left-hand column below. The 
first is correct. 


REPORTED UNIT SIGNIFICANT IMPLIED PER CENT 
MEASUREMENT IMPLIED DIGITS LIMITS ERROR 

62.4 ft. tenth-foot 6,2,4 62.35-62.45 . 08 

39.37 in. 

24 months 

025 lbs. 


4,000 miles 
4,000 miles 


27. What error in 5,000 ft. would correspond to an error of .1 in. in 4 ft.. 
2 in.? 
28. Round each of the following numbers to three-figure accuracy: 


56,982 99.00 3.4650 06750 258.5 3.0048 


29. Examine the squares and square roots of several approximate numbers. 
How many significant digits does a square or square root of a given 
number contain? 

30. Assume that all of the numbers involved in the following computations 
are approximate. Round off results to the appropriate degree of 


accuracy. 

(4.13)? (.413)? (3.1416) (24) (.02) (48.5) 
4/243 4/24 2/3 .33/86 
3,245.5 + 163.27 + 2.468 -- 13.3 = 6.315 — 12.5 + 3.29 = 


31. If 75 is an exact number and all of the other numbers in the computa- 
tions below are approximate, how many significant digits should be 
retained in each result? 


(75)? Vb (75) (46.24) (7.26)? 
75/321 85/75 5,621 (9.246) (15)/25 


(7.3) (4.22) (15) 75 + 2.15 + 7.3 


32 Statistics т Education 


32. Point out the inconsistencies in each of the following. What is the 
proper procedure in each? 


a. A school superintendent’s budget contained these items: library 
books, $5,250; periodicals, $525.25; book repairs, $250. Subtotal, 
$6,025.25. 

b. A textbook reported the diameter of the sun as 864 Х 10° miles, 
the equatorial diameter of the earth as 7,926 miles, and the ratio of 
the two as 109.008. 

c. In finding the circumference of a circle by the formula C = та, а 
student measured the diameter with a tape graduated to quarter 
inches and multiplied by 3.1416. 


The Study of Statistics 


The several aims of an introductory course in statistics which 
have been implied in the preceding pages are stated below in terms 
of what the student is expected to acquire: 


a. An understanding of statistical method as applied in educational 
research and appraisal and the ability to interpret statistical 
findings. 

b. The ability to compute various statistical measures, to analyze 
numerical data intelligently, and to report statistical data. 

с. А thoughtfully critical attitude toward statistical method and 
the ability to detect misuses of statistics. 

d. Ап appreciation of the value of statistics in dealing with educa- 
tional problems. 


Mathematical Training Not Necessary. Аз a tool in research, 
statistics comprises methods of collecting and analyzing quantita- 
tive evidence relating to problems which cannot be dealt with suc- 
cessfully by intuition or by logical methods. Statistical thinking is 
essentially similar to scientific thinking. The central question is al- 
ways, “ What is the evidence and what can we learn from it?" 

Being quantitative, statistics leans heavily upon mathematics, 
both in theory and application. Although the mathematics required 
in application are largely arithmetic and easy algebra, parts of sta- 
tistical theory require higher mathematies. The student who has 
had mathematical training is fortunate indeed. He will find both 
the application and theory of elementary statistics relatively simple. 
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But the study of statistics in educational work cannot be limited 
to those trained in mathematics. Statistical methods are so generally 
needed and the statistical treatment and interpretation of observa- 
tional data are so widely used that any worker in education is 
seriously handicapped without some knowledge of statistics. Stu- 
dents not trained in mathematics, although they will rarely become 
accomplished statisticians, can acquire all of the abilities listed 
above. Those who wish to review mathematics, as used in elemen- 
tary statistics, will find Walker’s book (Ref. 9) extremely helpful. 

How to Study Statistics. Statistics tends to be a difficult study, 
not because it is mathematically complex, but because it involves 
a point of view and ideas which are new to most people. If the stu- 
dent is to acquire more than a superficial understanding and wooden 
use of statistics, a great deal of patient, independent work and hard 
thinking will be necessary. 

Тһе writer's experience with beginning students has led to several 
convictions regarding effective methods of study. As a rule, the 
student should read carefully the material in a section before at- 
tempting to do the exercises and then reread the material after he 
has worked the exercises. The application of statistical techniques 
to particular problems not only develops skill in application; it 
contributes enormously to the understanding of theory. 

Since statistical theory and technique usually relate to numerical 
facts, a sort of study which may be described as numerical checking 
or numerical clarificalion may be of aid to the student. As a simple 
illustration of numerical clarification, suppose the statement is made 
that the addition of a constant to each of N scores changes the sum 
of the original scores by an amount equal to N times the constant. 
This somewhat involved statement is easy to check or clarify nu- 
merically. Let three numbers 2, 4, and 5, be the scores. The sum of 
the three is 11. Now suppose that we add the constant 10 to each 
of the three. We now have 12, 14, and 15, whose sum is 41. The dif- 
ference between the sums is 30, and 30 is equal to № times the con- 
stant, i.e., 3 X 10. The above is equivalent to checking the algebraic 
equation Z(X + k) = ХХ + Nk, in which Z means sum; X takes 
the values 2, 4, and 5; № = 3; and k = 10. 

The student will find many opportunities for numerical checking 
and clarification. The manipulation of a simple, regular series, such 
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as 3, 4, 5, 6, 7, or of an irregular series, such as 0, 3, 5, 37, may help 
in clarifying a difficult concept. In later pages, numerical study will 
be suggested from time to time. 

We should remark, perhaps, that a numerical check does not con- 
stitute general proof. The check illustrated above shows only that 
the statement holds true for the numbers 2, 4, and 5 and the constant 
10. However, if a general statement or formula is false, its falsity 
may sometimes be detected by numerical check. 

There is another noteworthy aid to understanding statistical 
concepts. If the student has statistical data in which he has a 
personal interest and subjects the data to the various techniques 
to be described in the following pages, he usually gains rapidly in 
understanding. The writer provides numerous opportunities for 
application of techniques, but none will be as productive as those the 
student will make if he studies problems of his own choosing. 


Exercises 


33. Check numerically to determine the meaning and possible incorrect- 
ness of the following: 


a. The square root of the product of two numbers is equal to the 
product of the square roots of the numbers. 

b. The square of the sum of three numbers is equal to the sum of the 
squares of the numbers. 

€. az + by + ez = (a -- В+ e)(z +У + 2) 


34. Numerically investigate the effect upon the sum of a statistical series 
of multiplying each number in the series by a constant. The effect 
upon the sum of a statistical series of decreasing each number in the 
series by a constant. 
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Chapter II 


Organization and Presentation of 


Statistical Data 


IT IS frequently the case that statistical data must be organized 
in tables or graphs before their meaning becomes clear. The process 
not only reduces the bulk of the data to comprehensible size, it 
brings into relief the resemblances and differences within and be- 
tween classes of data and thus facilitates comparison. Moreover, 
tabulating or grouping data often simplifies the computation of the 
important summary statistics considered in later chapters. 

In this chapter the commonly used methods of organizing and 
presenting statistical data will be discussed, with emphasis upon 
the frequency distribution, a fundamental concept in statistical 
theory. 


Statistical Tables 


When data are tabulated or arranged in selected classes, and the 
arrangement described by titles and subtitles, a statistical table re- 
sults. Such tables are of great usefulness in reporting the results of 
an investigation. Referring to the college freshmen of Table II, 
Appendix B, suppose that we wish to compare the socioeconomic 
ratings of those who attended private with those who attended 
public secondary schools. We might summarize in a paragraph: 


Thirteen or 27.7 per cent of the freshmen who had attended private 
schools had A socioeconomic ratings; twenty-two or 46.8 per cent had 
B ratings; and twelve or 25.5 per cent had C ratings. Twenty-one or 
21.2 per cent of the freshmen who had attended public schools had A 
socioeconomic ratings; forty-five or 45.5 per cent had B ratings; and 
thirty-three or 33.3 per cent had C ratings. 
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We may say the same thing more clearly and concisely by means of 
a table, such as Table 2.1. : 
TABLE 2.1 


TYPES OF SECONDARY SCHOOL ATTENDED AND 
SOCIOECONOMIC RATINGS OF 146 LIBERAL ARTS 
COLLEGE MALE FRESHMEN 


SOCIOECONOMIC 


RATINGS* 
A 21 21.2 
B 45 45.5 
c 33 33.3 
TOTAL 99 100.0 


* Based upon occupation of father and educational at- 
tainments of both parents. 


General and Special Purpose Tables. It is desirable to dis- 
tinguish between two kinds of tables, the general purpose and the 
special purpose. The general purpose table is essentially a reference 
or basic data table. Its object is to present the data resulting from a 
part or all of an investigation in such manner that any particular 
item may be readily located or referred to. Its very comprehensive- 
ness may make the general purpose table rather involved and com- 
plex. Ordinarily, much of the complexity can be resolved by alpha- 
betizing or serially numbering items, by clear captions and stub 
designations, and by logical organization of data. Tables I and П, 
Appendix B, are illustrations of general purpose tables. 

When data are classified or reclassified in order to bring out in- 
terrelationships, trends, and so forth, the special purpose table re- 
sults. Since the purpose of the table is to focus data on a particular 
issue, its outstanding characteristics should be simplicity and 
clarity. Ordinarily, the story told by the special purpose table is elab- 
orated in the text. Table 2.1 is an example of a special purpose table. 

The Construction of Tables. The mechanical features of a sta- 
tistical table include the number, title, horizontal arrays or rows, 
and vertical arrays or columns. Two parts of the table are reserved 
for designating the classes. One is at the left-hand column and is 
called the stub; the other, at the top of the columns. The headings. 
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the columns are called captions. In Table 2.1, the socioeconomic 
tings are designated in the stub; the types of school attended, by 
iptions. 

The mechanical features of both general and special purpose 
‘bles are the same. Table, number of table (often in Roman 
merals), and title are written at the top, with words or first letters 
' words in capitals. Heavy single rules are used at the top and bot- 
m of the table. Single horizontal and vertical rules are used as 
едед to separate classifications. It is the preferred practice to 
je rules as sparingly as possible. Table 2.1 illustrates a conven- 
onal form, the footnote being desirable because there are various 
ays of rating socioeconomic status. 

A satisfactory statistical table is a product of ingenuity and effort. 
‚ combines appropriate classification, a minimum of space, and a 
aximum of clearly presented information which bears on the prob- 
m or problems under investigation. Although few rules can be 
ven for the construction of tables, the following general considera- 
ons should be kept in mind: 


Тһе function of the title is to describe the contents. It tells what 
the data are, how they are classified, and, if pertinent, where and 
when they were collected. It should be as brief as is compatible 
with completeness and clarity. 

. The captions and the heading of the stub should be clear and 
should be consistent with the terms used in the title. The func- 
tion of captions and heading is to give the breakdown of the 
tabulations. 

. Possible ambiguity of a unit of measurement ог a term should 
be clarified by a footnote. Any exceptional circumstances should 
be footnoted. 

А table should be, insofar as possible, self-contained and self- 
explanatory. 


The interested student can find further and more detailed sugges- 
ons in Refs. 7, 8, and 9, listed at the end of the chapter. 


Exercises 


The following data are taken from the Statistical Abstract of the United: 


States, 1943, p. 218. Organize the data in a general purpose table. 
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Secondary School and College Enrollments for Continental United States, 
1900, 1920, 1940. Public high schools: 1900, 519,251; 1920, 2,200,389; 
1940, 6,601,444. Private high schools: 1900, 110,797; 1920, 213,920; 
1940, 457,768. Normal schools and teachers’ colleges: 1900, 69,593; 
1920, 135,435; 1940, 177,045. Colleges, universities, and professional 
schools: 1900, 167,999; 1920, 462,445; 1940, 1,316,158. 

2. Construct a special purpose table to bring into comparison increases in 
public and in private high school enrollments, as given in exr. 1, above. 

3. It has been said that all research reports should append general purpose 
tables, Why is such a rule generally a good one? Under what conditions 
should exceptions be made? 


The Frequency Distribution 


One of the commonest and simplest of special purpose tables is 
the frequency distribution. Although qualitative data, as we have 
seen, must be classified as frequencies in specified categories or 
qualitative classes, the term frequency distribulion, as used in sta- 
tistics, ordinarily refers to the tabulation of quantitative data in 
classes which vary in size. 

When we are interested primarily in the manner in which the 
items in a long quantitative series vary in size, the frequency dis- 
tribution is an appropriate method of classification. Whether the 
series із discrete or continuous, its frequency distribution has both 
theoretically and practically important aspects in statistics. The 
more elementary theoretical aspects will be touched on in this sec- 
tion; in later chapters, the distribution will be utilized to simplify 
several types of statistical computations. 

The Distribution of Discrete Series. To illustrate the fre- 
quency distribution of a discrete series, let us consider a rather 
Lypical empirical study of the operation of chance. Suppose we have 
tossed 5 coins 20 times and have observed the results: 


ИЛЛЕ ШТ ЛАЛ, HHTTT НТТТТ НННТТ 
ННТТТ ННННТ НННТТ НННТТ ННТТТ 
ННТТТ НННТТ АШЫМ, НННТТ HHHHH 
HHHTT HHTTT HHHHH HHHTT HHHTT 


In this situation, the frequencies of 5 heads, 4 heads, 3 heads, and 
so on constitute the information of concern. This information may 
be conveniently and informatively classified, as shown in Table 2.2. 
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TABLE 2.2 


DISTRIBUTION OF HEADS ON 20 
TOSSES OF 5 COINS 


FREQUENCY OF 


HEADS OCCURRENCE 
5 2 
4 1 
3 8 
2 5 
1 3 
0 1 
TOTAL FREQUENCY 20 


————M————ÓÓM—Ó— 


The discrete values, 5 to 0, constitute the classes in this distribution. 
The frequencies in the classes are recorded in the right-hand column. 

In further illustration, suppose a research worker has investigated 
the size of the families represented by 1,000 high school students. То 
make his data manageable, he would ordinarily first classify them 
in order of frequency of occurrence of different sizes of families. 
In other words, he would construct a frequency distribution. 

The Distribution of Continuous Series. The classification of 
measures of a continuous series is usually somewhat more involved 
than that of a discrete series. Consider the 138 VAT scores of the 
college freshmen, shown in Table II, Appendix B. As given in the 
table we can learn little from them. The scores might be arranged 
in descending or ascending order, and tally marks used to indicate 
the frequency of each score. If this were done, we would have: 


SCORE FREQUENCY 
345 / 
357 / 
370 if 
392 / 
395 / 
751 / 


SS ----- 
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This arrangement would be one possible frequency distribution of 
the scores, but it would not give us the * best" view of the tendency 
of the scores to cluster around certain values, and very likely would 
be substantially altered if we added the scores from some other 
freshman group. In short, such a distribution would be too detailed 
to give a comprehensive or stable picture of the performance of 
freshmen on the verbal aptitude test. Another objection to such a 
distribution, although of less importance, is that it would fail to 
simplify later computations to an appreciable extent. 

Now let us telescope the array by indicating merely the number 
of scores falling in the interval 330 to 359, 360 to 389, 390 to 419, 
and so on. This gives one or the other of the distributions shown in 
Table 2.3, depending upon whether we indicate the score classes 
in ascending or descending order. Although either method may Бе | 
used, the latter is the more common. 


TABLE 2.3 
DISTRIBUTION OF SCORES OF 138 MALE 
LIBERAL ARTS COLLEGE FRESHMEN ON 

COLLEGE BOARD VERBAL 
APTITUDE TEST 


SCORE FREQUENCY 


SCORE FREQUENCY 


330-359 2 180-809 1 
360-389 1 750-779 2 
390-419 6 720-749 3 
420-449 7 690-719 6 
450-479 14 660-689 T 
480-509 24 630—659 12 
510—539 16 600—629 8 
540-659 14 570-599 15 
570-599 15 540-569 14 
600-629 8 510-539 16 
630-659 12 480-509 24 
660-689 T 450-479 14 
690-719 6 420-449 7 
720-749 3 390-419 6 
750-779 2 360-389 1 
780-809 1 | 330—359 2 
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The student may well ask why the particular selection of 330-359 
... 180-809 classes was made in the present example. The answers 
to the question will provide general suggestions for constructing 
the frequency distribution. For the data in hand, the selection 


a. Permitted 16 classes. Fewer than about 10 classes or more than 
about 20 tend to obscure the significant features of most collec- 
tions of data. Some writers defend the “about 10-20 classes" rule 
on the basis of ease and accuracy of later computations. These 
are only secondary considerations; the first is that of provision 
for a comprehensive and stable classification. (See exr. 4.) 

b. Permitted a convenient classification or grouping interval. Some 
multiple of 5 is to be preferred as the grouping interval if its 
use does not result in the violation of other desirable features. 

с. Permitted the extreme measures, 345 and 795, to fall near the 
middle of the lowest and highest classes. This cannot always be 
achieved, but should be kept in mind in selecting a grouping 
scheme. 

d. Utilized lower indicated class limits 330, 360, . . . 780 which 
are divisible by the grouping interval 30. This makes it somewhat 
easier to tabulate the scores. 


Another suggestion should be added to the above. The grouping 
interval should be an odd number, provided its use does not result 
in violation of other desirable features, and provided 10, 20, 30, or 
some other multiple of 10 is not appropriate. Odd intervals simplify 
later computations. 

The technical terms and ideas relating to the frequency distribu- 
tion are described below: 


a. The grouping interval is called the class interval and is abbrevi- 

_ ated i. Although it is possible and sometimes useful to change the 
interval for one or more classes, the resulting distribution de- 
mands special interpretation and treatment. In general, i should 
be and is constant for all classes. 

b. The designations of classes, e.g., 330-359, 360-389, and so on, in 
Table 2.3, are called the indicated class limits or the expressed or 
wrillen limils. Indicated class limits should be expressed as 
simply as possible, but must be stated with sufficient precision 
to prevent oyerlap. Thus, for the freshman semester averages, 
Table II, Appendix B, if we use an interval of 5.0, we would need 
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to indicate our lowest class as 50.0-54.9, or 50.5-55.4. If the 
measures to be tabulated are integral, the indicated class limits 
should be integral; if in tenths, the limits should be in tenths 
and so on. Classes are sometimes indicated by their mid-value 
or mid-point. (See d below.) Ў 

с. The real class limits (sometimes called implied, mathematical, or 
boundary class limils) are always understood to extend 1/2 unit 
above and below the indicated class limits. Thus, the real limits 
of class 330-359 are 329.5-359.5; the real limits of the class 
50.0-54.9 are 49.95-54.95; the real limits of 50.5-55.4 are 
50.45-55.45. The indicated class limits are easy to write and easy 
to use in tabulation. The real limits are, however, the limits 
which must obtain if we are to consider our measures ав соп- 
tinuous, Real limits, not indicated limits, are used in statistical 
computations. Expressed limits are for convenience only. 

d. The mid-point or mid-value of a class is defined by 1/2 of the sum 
of the indicated limits of that class (or, what is the same thing. 
1/2 of the sum of the real limits). Thus, the mid-point of the 
330-359 class is (330 + 359)/2 or 344.5. The mid-point of the 
class 50.0-54.9 is 52.45. When the class interval i is an odd num- 
ber, the mid-value will contain no more digits than the indicated 
class limits. If the latter are integral, the mid-value will be 
integral. 

The student may find that different. writers follow different con- 
ventions, but the above conventions are easy to follow and entirely 
serviceable and logical. Obviously, a few variables such as ages 
taken as of last birthday yield measures which require special 
treatment in grouping. 

Construction of the Frequency Distribution. No definite 
rules for constructing a frequency distribution can be given. Most 
collections of data are unique in one or more ways, and the grouping 
scheme which is appropriate for one collection may not be appro- 
priate for another. There are, however, several general suggestions 
which help in making a frequency distribution of a given series. 
These have been touched on in the preceding pages; let us sum- 
marize them here: 

a. Find the range of the series to be grouped by subtracting the 


lowest value from the highest. 
b. Divide the range by the number of classes desired in order to 
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“gain an idea of the size of the class interval needed. As a general 
rule, if the series contains fewer than about 50 items, more than 
about 10 classes are not justified; in fact, it is sometimes the 
case that the characteristics of a short series stand out better 
if fewer than 10 classes are used. If the series contains from about 
50 to 100 items, 10 to 15 classes tend to be appropriate; if more 
than 100 items, 15 or more classes tend to be appropriate. Ordi- 
narily, not fewer than 10 classes or more than 20 are used. 

c. If the range divided by the number of classes gives a quotient 
which is near 5 or some multiple of 5, use 5 or the multiple 
as the class interval; if not, select for tryout the odd number 
which is nearest the quotient. If neither provides an appropriate 
number of classes, then take the even number which is nearest 
the quotient as the class interval. 

d. Fix the lowest class and indicate the limits of all classes, accord- 
ing to the suggestions given in the preceding pages. It ordinarily 
is preferable to fix class limits so that the lower expressed limits 
are divisible by the interval. This tends both to save time in 
tabulating the data and to prevent mistakes. If the items tend 
to cluster around certain values, however, the class limits should 
be fixed so that these values are at or near the mid-points of the 
classes. In a distribution of cafeteria checks, for example, if 5 
were selected as the class interval, the classes would ordinarily 
be fixed so that the mid-points of the intervals were multiples 
of 5, such as 8-12, 13-17, and so оп. 

е. Make a tally sheet and tally the items as illustrated below. 


SCORE TALLY MARKS FREQUENCY 
780-809 / 1 
150-779 7 ° 
120-149 ” 3 
690-719 ж 6 


f. Маке a special purpose table, following the usual conventions, 
from the first and third columns of the tally sheet. 


Tt should again be emphasized that no thumb rule or mechanical 
pattern can be followed in making a frequency distribution. It is 
rarely the case that one particular grouping scheme can be said to 
be the best among the many possible. Moreover, a scheme appro- 
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priate for bringing out the stable characteristics of a given collection 
of data may not be appropriate to use in computing statistics, such 
as the mean, from the same data. All that can be said is that, in 
constructing a frequency distribution, there should be good reasons 
for the particular scheme adopted. 


Exercises 


4. Tally the first 46 of the VAT scores Table II, Appendix B, in the 
classes 330-344, 345-359, . . -> 180-794. In a parallel frequency 
column, tally the next 46, and ша third column, the last 46. Now 
tally the first 46 in the classes 300-399, 400-499, 500-599, 600-699, 
and 700-799; the next 46 in a parallel frequency column, and the last 

46 in a third frequency column. Comment on the comprehensiveness, 

informativeness, and stability of the two grouping schemes. 

The open-end distribution is one in which the lowest or highest class 

interval has no specified range; е.5-, in a distribution of IQ's the 

highest class may be indicated “140 and above." What sort of data 
necessitates the open-end distribution? 

6. With reference to Table II, Appendix B, suggest several grouping 
schemes (stating which you prefer and why) for the chronological 
ages of the freshmen, Regents' language, and Regents' average scores. 
To prevent confusion in tabulation, how would the classes for the 
Regents' averages have to be indicated? 

1. Suppose you had ages as of last birthday for 100 individuals varying 
from 10- to 20-year-olds. How would you group them? How would 
you interpret the class limits? 

8. Each row of the following chart refers to a grouping scheme for a 


set of data. Fill in the blanks. 


л 


CONSECUTIVE 


SIZE OF SCORES 
CLASS INCLUDED IN REAL EXPRESSED 
INTERVAL MID-POINT INTERVAL LIMITS LIMITS 
2 88.5 
3 66 64.5-67.5 
5 48-52 
7 77 
10 90-99 
5 3.7 
2.20-2.23 
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9, Examine the achievement test scores in Table I, Appendix В. Suggest 

a grouping scheme for each set and be able to defend your suggestions. 

10. Explain the statement, “The ‘typical’ frequency distribution of test 

scores supports the idea that the more extreme a score is, the less fre- 
quently it tends to occur." 


Graphical Presentation of Statistical Data 


More than 150 years ago, William Playfair, who claimed to be 
the inventor of the line graph, said of “charting” (Ref. 5, p. x): 


Аз the eye is the best judge of proportion, being able to estimate it 
with more quickness and accuracy than any other of our organs, it fol- 
lows that wherever relative quantities are in question . . . this mode of 
representing . . . is peculiarly applicable; it gives a simple, accurate, 
and permanent idea by giving form and shape to a number of separate 
ideas, which are otherwise abstract and unconnected. 


Modern psychological investigations of the “visual-mindedness’” 
of most people have substantiated Playfair’s claim. 

There are numerous graphical devices in use today. Few news- 
papers, periodicals, books, or technical reports are content to rely 
solely upon texttial and tabular presentation of data. Graphical 
devices catch the eye and permit comparisons of data at a glance. 
In many situations, the fact that the information provided by 
graphs is limited both in amount and in exactness is of little conse- 
quence. Rough comparisons of simple aspects of the data often is 
all that is desired. 

There are several graphical devices which are particularly useful 
in analyzing and interpreting the frequency distribution. These 
will be taken up in a later section. At this point it is desirable to 
consider briefly the common types of graphs and the principles 
underlying their construction and interpretation. 

Common Types of Graphs. The many common graphical 
methods of presenting statistical data can be classified according to 
two broad purposes: (1) to contrast numbers, proportions, or per- 
centages of objects falling into two or more classes, and (2) to 
show how changes in one variable are related to changes in a second. 
For the first purpose, circle and bar graphs and pictographs are 
commonly used, and for the second, line graphs. 

Fig. 2.1 shows a circle graph of the data of Table 2.4. There are 


— pupa w 
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various ways of bringing out details in circle graphs. The different 
sectors may be shaded differently. Keys, legends, and supple- 
mentary descriptive material may be included underneath or at 


Grade School 


Fig. 2.1. Years of school completed by persons 25 years old and over 
in the United States. (From Table 2.4.) 


one side of the circle. Either the absolute numbers or the propor- 
tions represented by the sectors ordinarily are inserted in the sectors, 
the latter being favored if the absolute numbers are presented in an 
accompanying table, as they should be. The circle graph is perhaps 
the best graphical device available for contrasting the components 


N 
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of a whole. It is simple, easily understood, and minimizes the danger 
of distortion in the part-whole situation. 

Ап ordinary cardboard protractor can easily be calibrated to per- 
mit reading and plotting percentages directly, as shown in Fig. 2.2. 
Such a protractor saves time and minimizes error in constructing 
circle graphs. 


TABLE 2.4 
YEARS OF SCHOOL COMPLETED BY PERSONS 
25 YEARS OLD AND OVER IN THE 
UNITED STATES* 


SCHOOL YEARS NUMBER OF 
COMPLETED PERSONS PER CENT 

TOTAL 82,578,000 100.0 
Grade school 

1 to 4 yrs. 8,611,000 10.4 

5 and 6 yrs. 1,290,000 8.8 

7 and 8 yrs. 25,018,000 30.3 
High school 

1 to 3 yrs. 13,487,000 16.3 

4 yrs. 16,926,000 20.5 
College 

1 to 3 yrs. 5,533,000 6.7 

4 yrs. or more 4,424,000 5.4 
Yrs. not reported 1,289,000 1.6 


* Source: The World Almanac, p. 570, copyright 1950. 


The bar graph ordinarily may be used interchangeably with the 
circle graph. In Fig. 2.3 the data of Table 2.4 are presented by means 
of a bar graph. Although the idea of parts of a whole is not brought 
out by a bar graph, the differences between the numbers in the 
classes are more pronounced. Circle and other area graphs, as well 
as volume graphs, tend to de-emphasize differences between quan- 
tities and, as a rule, should be used only when the idea of com- 
ponents of a whole is important. 

The use of the bar graph in presenting part of the data of Table 
2.5 is illustrated in Fig. 2.4. It would be possible to contrast the 
number of publicly controlled and the number of privately con- 


aL 


Fig. 2.2. The percentage protractor. 


Millions of persons 
0 5 10 15 20 25 
тт 


College 4 yrs. 
or more 


Not reporting KA 


0 5 10 15 20 25 


Fig. 2.3. Years of school completed by persons 25 years old and оуег 


in the United States. (From Table 2.4.) 
49 
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trolled junior colleges by using two bars, stde by side and shaded 
differently, for each year, and another *double-bar" graph might 
be used in presenting public and private college enrollments. То 
present all of the data of Table 2.5 in a single graph would lead to 


1920 1924 
Ф Үеаг 
Fig. 2.4. Number of junior colleges in the United Siate 1920 to 1948. 
(From Table 2.5.) 


complexities which would make the graph difficult to read. The 
amount of information which can be conveyed by а single graphical 
device is quite limited, particularly so for the circle and bar graphs. 

Although the bar graph is widely used in presenting chronological 
data, it is not appropriate if change or trend, rather than absolute 
magnitudes, is of primary interest. Line graphs are peculiarly well 
suited to show how changes in one variable are related to changes 
in a second. If we desire to show trend in junior college education, 
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as indicated by numbers of colleges during the period 1920—1948, 
the line graph illustrated in Fig. 2.5 is appropriate. 

It is possible to plot both the numbers of junior colleges and the 
enrollments for the period 1920—1948 in the same graph, by using 
percentages instead of absolute numbers. If we select 1920 as a base 
year, the number of junior colleges in 1924 is 132/52 or about 254 


TABLE 2.5 


JUNIOR COLLEGES—NUMBER AND ENROLLMENT FOR 
CONTINENTAL UNITED STATES: 1920-1948* 
Е eee - 


ALL SCHOOLS PRIVATELY 
ee NO PUBLICLY CONTROLLED ae 
YEAR 
NUMBER | ENROLLMENT | NUMBER | ENROLLMENT | NUMBER | ENROLLMENT 
1920 52 8,102 10 2,940 42 5,162 
1924 132 20,559 39 9,240 93 11,319 
1928 248 44,855 114 28,437 134 16,418 
1932 342 85,063 159 58,887 183 26,176 
1936 415 102,453 187 70,557 228 31,896 
1940 456 149,854 217 107,553 239 42,301 
1944 413 89,208 210 60,884 203 28,324 
1948 472 240,173 242 178,196 230 61,977 


* Source: Statistical Abstract of the United States, 1953, р. 125. 


per cent of the number in 1920; the number in 1928 is 248/52 or 
about 477 per cent, and so on. The enrollment in 1924 is 20,559/8,102 
or about 254 per cent of the enrollment in 1920, the enrollment in 
1928 is 44,855/8102 or about 554 per cent, and so on. The two per- 
centage curves are shown in Fig. 2.6. They are labeled in the graph; 
they could be identified equally well by solid and dotted lines and 
key. The curves faithfully picture the original relationships, since 
the plotted values are derived by multiplying each of the original 
values by constants. The constant used for the number of colleges 
is 100/52, and that for the enrollments 100/8,102. When we inspect 
Fig. 2.6, we note that the enrollments have increased at a much 
faster rate than the number of colleges. 

The line graph is widely used in statistical work. In general, it is 
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Fig. 2.5. Number of junior colleges in the United States, 1920 to 1948. 


(From Table 2.5.) 
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Fig. 2.6. Percentage increase in junior colleges and enrollments in 


the United States, 1920 to 1948. Base year, 1920. 
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the best graphical method available of depicting changes over а © 
period of timdà: When percentages are used, the changes of two 
or more variables can be contrasted, provided a common base year 
can be selected which is appropriate for each of the variables. (See 
exr. 16.) 

The line graph is frequently useful in presenting and analyzing 
quantitative as well as chronological data. We shall return to it at 
several points in later pages. 

Graphs which utilize symbols, cartoons, and caricatures to repre- 
sent quantities are known as pictographs. Although pictographs are 
attractive to the eye, they are as a rule too inexact to be worth the 
time their construction requires. 

The Construction of Graphs. Аз in the case of tables, there 
are no uniform rules or conventions covering the construction of 
graphs. Like other methods of organizing and presenting data, 
graphical methods are largely arbitrary and hence subjective. A 
good graph, like a good table, requires ingenuity and judgment. 

The most important thing to keep in mind in the construction of a 
graph is the thought of constructing a clear and accurate picture of „/ 
the features of the data singled out for special attention. Since the 
graph ordinarily makes a vivid impression, care also should be 
taken to make sure that the pictured features do not distort the 
meaning of the data as a whole. It is not difficult to find research 
reports in which the bulk of the data buried in tables would, if 
studied, contravene the impression created by the graphs based 
upon selected features of the same data. 

There is no “best” way of presenting data graphically. Any one 
of several methods may be equally good. Familiarity with graphical 
methods and their limitations plus a clear idea of what the graph is 
expected to accomplish are the best aids to selecting an appropriate 
device. 

Graphs, like tables, should be numbered consecutively and should 
have comprehensive titles. It is customary, however, to place num- 
bers and titles below graphs and to use lower case rather than upper 
case type. The source of the data should always be indicated after 
the title. Keys and supplementary descriptive material should be 
included as needed in understanding the graph. The practice of in- 
cluding numbers or percentages represented by sectors, bars, or 
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points and of labeling certain parts on the graph itself is desirable, 
provided it does not interfere with the simplicity and the visual 
pattern of the graph. It is of utmost importance that a graph be 
sufficiently complete to enable the reader to grasp the pictured 
idea without reference to the text itself. This is really the reason 
for using the graph. Graphs should, however, always appear close 
to the place in the text in which they are mentioned. 

The question regarding appropriate dimensions of bar and line 
graphs is important, because quite different impressions result when 
different ratios of height to width are used. A line graph, for example, 
can be given a different slope by changing either the vertical or 
horizontal scale. (See ехг. 14.) Unfortunately, there is no definite 
answer to the question regarding satisfactory dimensions. Gen- 
erally speaking, bar graphs will be most pleasing to the eye and will 
give a fair impression if their over-all height is about 3/4 of their 
over-all width, Little can be said about the dimensions of line graphs. 
Since their apparent rise or fall is partly consequent to the scales 
employed, they must always be interpreted with care. The visual 
impression of the line graph depends partly upon the relation of the 
line to the zero point on the vertical scale. When zero is not included 
on the vertical scale, the visual impression is likely to be incorrect. 
In case the inclusion of zero results in a graph of unmanageable 
height, a horizontal break in the graph should be clearly shown. 

The following suggestions for graphical presentation were pre- 
pared by a committee of the American Society of Mechanical 
Engineers (Ref. 1): 


sed. The general arrangement of a diagram should proceed from left to 
right. 

2. Where possible, represent. quantities by linear magnitude, as areas 
or volumes are more likely to be misinterpreted. 

3. For a curve, the vertical scale, whenever practicable, should be so 
selected that the zero line will appear in the diagram. 

4. If the zero line of the vertical scale will not normally appear in the 
curve diagram, the zero line should be shown by the use of a hori- 
zontal break in the diagram. 

5. The zero lines of the scales for a curve should be sharply distinguished 
from the other co-ordinate lines. 

6. For curves having a scale representing percentages, it is usually desir- 
able to emphasize in some distinctive way the 100 per cent line used. 
as a basis of comparison. 
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7. When the scale of the diagram refers to dates, and the period герге- 
sented is not a complete unit, it is better not to emphasize the first 
and last ordinates, since such a diagram does not represent the 
beginning and end of the time. 

8. When curves are drawn on logarithmic co-ordinates, the limiting 
lines of the diagram should each be of some power of 10 on the 
logarithmic scale. 

9. It is advisable not to show any more co-ordinate lines than are 
necessary to guide the eye in reading the diagram. 

10. Тһе curve lines of a diagram should be sharply distinguished from 
the ruling. 

11. In curves representing a series of observations, it is advisable, when- 
ever possible, to indicate clearly on the diagram all the points герге- 
senting the separate observations. 

12. The horizontal scale for curves should usually read from left to right 
and the vertical scale from bottom to top. 

13. Figures for the scale of a diagram should be placed at the left and 
at the bottom or along the respective axes. 

14. It is often desirable to include in the diagram the numerical data 


or formulae represented, 

15. If numerical data are not included in the diagram, it is desirable to 
give the data in tabular form accompanying the diagram, 

16. All lettering and all figures in a diagram should be placed во a8 to 
be easily read from the base as the bottom, or from the right-hand 


edge of the diagram as the bottom. 
17. The title of a diagram should be made as clear and complete as pos- 
sible, Subtitles or descriptions should be added if necessary to insure 


clearness. * 


During recent years a great deal of attention has been given to the 
principles and techniques of graphical presentation of data. In the 
preceding pages we have touched upon only a few of the more ele- 
mentary aspects of the topic. The student who is interested in more 
detailed and comprehensive treatment of graphical methods may 
wish to consult Refs. 2, 3, and 4. 


Exercises 


11. The current expenditures (in millions of dollars) of state school systems 
for 1949-1950, as given by Statistics of State School Systems, 1949-1950, 
p. 22, were: general control, 220; instruction, 3,112; operation, 428; 


* American Society of Mechanical Engineers, New York, Time-Series Charts: 
A Manual of Design and Construction, Copyright 1938 by the Society and used 
by their permission. 
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12. 


13. 


14, 


15 


16. 
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maintenance, 214; auxiliary agencies, 452; fixed charges, 261. Present 
the data by means of a circle graph. What feature of the data is clearly 
brought out? What feature would be better brought out by a bar 
graph? Can a line graph of the data be made? Would it serve any 
useful purpose? 

The enrollments in institutions of higher education for continental 
United States, as given by Statistical Abstract of the United States, 
1953, p. 125, were (in thousands): 1900, 237.6; 1910, 355.2; 1920, 
597.9; 1930, 1,100.7; 1940, 1,494.2; 1950, 2,659.0. Construct a bar graph 
and a line graph of the enrollments. What feature of the data does the 
bar graph tend to emphasize? The line graph? 

Change each of the enrollment figures in ехг. 12 to a percentage of 
enrollment in 1900 (base year) and construct a line graph, using the 
percentages. How does the shape compare to the shape of the line 
graph in ехг. 12? What is the advantage of the percentage line graph? 
With respect to the 1935-1939 average, the price indexes for moderate- 
income families in large cities were: 1900, -57; 1910, .68; 1920, 1.43; 
1930, 1.19; 1940, 1.00; 1950, 1.71. Show by line graphs, having different 
vertical dimensions, that quite different visual impressions of price 
trend can be given. 

Find several examples of graphs in current newspapers or periodicals 
which may be made to give different impressions if a different graphical 
device is used. If a different scale is used. Can a general statement 
regarding a “best” method of graphing be made? 

The numbers of high school and college graduates, by sex, as given by 
the Statistical Abstract of the United States, 1953, р. 124, were, for 
selected years: 


YEAR OF HIGH SCHOOL COLLEGE 
GRADUATION MEN WOMEN MEN WOMEN 
1890 18,549 25,182 12,857 2,682 
1900 38,075 56,808 22,173 5,237 
1910 63,676 92,753 28,762 8,437 
1920 123,684 187,582 31,980 16,642 
1930 300,376 366,528 73,615 48,869 
1940 578,718 642,757 109,546 76,954 
1950 570,700 629,000 328,841 103,217 


Bring out what you believe to be the most striking features of these 
data by means of a single graphical device. 
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Graphical Presentation of the Frequency Distribution 


The bar and line graphs we discussed in the preceding section are 
easily adapted to the frequency distribution. Their use in this con- 
nection is important, both because it portrays the form of the dis- 
tribution and because it simplifies various rather complex points 


p 
22k 


ы 
© 


Frequency 
5) 


329.5 389.5 449.5 509.5 569.5 629.5 689.5 749.5 809.5 
359.5 419.5 479.5 539.5 599.5 659.5 719.5 779.5 
Score 
Fig. 2.7. Distribution of scores of 138 college freshmen on College 
Board Verbal Aptitude Test. (From Table 2.3.) 


in statistical theory. The three graphs most commonly used are 
the histogram, frequency polygon, and cumulative frequency curve. 
The Histogram. A histogram is essentially a bar graph of a fre- 
quency distribution. Its purpose is to show the frequencies within © 
classes graphically. Consider the frequency distribution of Table 2.3. 
We may mark off real limits on the scale of scores and construct \ 
bars or rectangles whose bases are the class intervals and whose 
heights are equal to the frequencies in the respective classes. This 
procedure results in the histogram shown in Fig. 2.7. It will be 
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noted that, since the heights of the rectangles are proportional to 
the frequencies and the bases of the rectangles are equal, the areas 
of the rectangles correspond to the respective class frequencies and 
the total area of the histogram to the total frequency, N, of the 
distribution. 

There are several ways of labeling the class intervals at the base 
of the histogram, the commonest being that of indicating the mid- 
points. The writer recommends, however, that the student follow 

the practice of indicating the 

8 real limits, as illustrated in Fig. 
2.7, until he becomes thoroughly 


2° accustomed to thinking about 
g^ the sides of the rectangles of 
©2 the histogram аз always erected 
at the real limits of the class 

0 intervals. 


ИЯ і 
Heads If a histogram of a frequency 


Fig. 2.8. Distribution of heads distribution of a discrete series 
on 20 tosses of 5 coins. (From І8 drawn, it is necessary to con- 
Table 2.2.) sider the classes as extending 

1/2 unit above and 1/2 unit 
below their discrete values. A histogram of the data of Table 2.2 
would be constructed as shown in Fig. 2.8. 

In the discrete series histogram, it is customary to indicate classes 
by labeling their mid-points, but here, as for continuous series, the 
student must remember that the sides of the reclangles in the hislo- 
gram are always drawn аі the real limits or the assumed real limits of 
the classes. 

Although the histogram can be, and occasionally is, used in pre- 
senting frequency distributions in research reports, it is primarily 
useful as an aid to understanding statistical method. The simple 
but never-to-be-forgotten fact that the areas of the reclangles of the 
histogram correspond lo frequencies in classes and the total area to 
the lolal frequency of the distribulion simplifies a great deal of sta- 
tistical theory. 

The Frequency Polygon. If the mid-point of the upper bases 
of the rectangles in a histogram are connected, the resulting figure 
is саПей а frequency polygon, and is illustrated in Fig. 2.9. 
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It is customary to connect the mid-points of the upper bases of 
the extreme left-hand and extreme right-hand rectangles to the mid- 
points of the adjacent zero-frequency classes, and thus to close 
the polygon. It is left as an exercise for the student to show that 
the area of the closed polygon is equal to the area of the histogram, 
and that consequently the area of the closed polygon of a distribution 
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Fig. 2.9. Histogram and frequency polygon of a distribution of 293 
intelligence quotients. (From Table 2.6.) 


corresponds to the total frequency N. Thus, either the area of the 
histogram or the area of the closed polygon may be thought of as 
graphically representing the total frequency of a distribution. 

The frequency polygon may also be thought of as a line graph 
showing how frequency within class varies as class intervals take 
on successively higher values on the scale of scores. In the polygon 
of Fig. 2.9, for example, the fact that the frequencies in successive 
classes increase up to the 83.5-90.5 class and then gradually de- 
crease is clearly depicted. Such interpretation is correct, because 
the height of the polygon al any vertex corresponds to the frequency in 
the class whose mid-point is directly below the verlez. 

It is not necessary, of course, to construct the histogram before 
constructing the frequency polygon. If dots are placed above the 
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mid-points of successive class intervals at a distance proportional 
to the frequencies, the polygon may be drawn without the histo- 
gram. A smoothed frequency polygon is the polygon constructed from 
frequencies which have been smoothed by taking moving averages 
of the frequencies. This matter will be discussed in a later section. 

The Cumulative Frequency Curve. [t is possible to portray а 
frequency distribution by the cumulative frequency curve. The nature 


TABLE 2.6 


INTELLIGENCE QUOTIENTS OF 293 
EIGHTH-GRADE PUPILS 
(Data from table I, appendix B) 


FREQUENCY CUMULATIVE FREQUENCY 
10 F CUM f 
1 293 
11 292 
23 281 
105—111 31 258 
98-104. 53 227 
91-97 61 174 
84-90 64 113 
11-83 27 49 
70-16 17 22 
63—69 2 5 
56—62 2 3 
49—55 1 1 
М = 293 


of this curve сап be made clear Ьу an example. Consider the fre- 
quency distribution of the IQ's of the eighth-graders as given in 
Table 2.6. It will be noted that a column “cumulative frequency " 
is included at the right of the frequency column. The cumulative 
frequency up to the 56—62 classes is 1; to the 63—69 class, 3; to the 
70-76 class, 5; ... ; and to the 126-132 class, 292; the entire 
cumulative frequency, of course, being 293, the total frequency N. 
The idea of cumulative frequency up to each successive class means 
that, if we are to treat the IQ's as continuous, we shall have to 
utilize real limits. “Ор to” the 70-76 class, for example, must mean 
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up to 69.5. The cumulative frequency curve for the distribution of 
IQ's is shown in Fig. 2.10. The student will note that the dots are 
placed over the real limits of the classes at heights corresponding to 
the cumulative frequencies up to the successive real limits. 

The number of scores falling below a given value on the scale of 
scores may readily be estimated from the cumulative frequency 


Frequency 


48.5 62.5 76.5 90.5 104.5 118.5 132.5 
55.5 69.5 83.5 97.5 111.5 125.5 
Intelligence quotient 
Fig. 2.10. Cumulative frequency curve of a distribution of 293 intelli- 
gence quotients. (From Table 2.6.) 


curve. For example, suppose we wish to estimate the number of 
10% falling below 100 in the illustrative distribution. We find the 
point on the curve in Fig. 2.10 directly above 100, go across horizon- 
tally to the frequency scale at the left, and read 195, approximately. 

We shall return to the cumulative frequency curve in Chapter 
IV. There we shall see that, when the cumulative frequencies are 
changed to percentages of total frequency, a cumulative percent- 
age curve of wide usefulness may be constructed. 

Construction of Histograms, Frequency Polygons, and 
Cumulative Frequency Curves. Most of the rules generally to be 
observed in the construction of these figures have already been 
mentioned. It should be emphasized again, perhaps, that the real 
limits of the class intervals are used in the construction of the 
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histogram and the cumulative frequency curve and the mid-points 

in the construction of the polygon. 

As for all graphs, clear and comprehensive titles should be in- 
cluded beneath the figures. The source of the frequency distribution 
should always be cited. Both horizontal and vertical scales should 
be clearly labeled. 'The vertical sides of the rectangles of the histo- 
gram may be omitted if the form of the distribution, rather than 

S the frequencies in particular class intervals, is being emphasized. 

As a rule, if the height of the highest rectangle is about 3/4 of the 

total width of the histogram, the histogram will be more pleasing 

to the eye than if very different proportions are used. The same can 
be said about the ratio of the over-all height to the over-all width 
of the polygon. The cumulative frequency curve should ordinarily 


yw 
< 


make, on the average, ап angle of 40° to 50° with the scale of scores. 


In general, the same principles apply to the construction and 
interpretation of graphs of frequency distributions as to other 
graphs. 


Exercises 


17. Toss 7 coins 50 times and make а histogram of the frequencies of oc- 
currence of 7, 6, . . . , 1, 0 heads. (Cf. p. 396.) 


18. 


19. 


а. 


b. 


с. 


а. 


е. 


Consider the histogram in Fig. 2.7. 


Does the area represent the total number of cases? 

What percentage of the total number of cases does each rectangle 
represent? What is the sum of the percentages? 

Through what point approximately on the scale of scores would you 
draw a vertical line to bisect the histogram? How many scores would 
lie below this point? Above? 

Through what points, approximately, on the scale of scores would 
you draw vertical lines to divide the histogram into quarters? 

Will the relationships in (b) above hold for all histograms? Explain. 


Show that the area of the “closed” frequency polygon is equal to the 


area of a histogram. 


. The distributions of the VAT and MAT scores, Table IT, Appendix B, 


are shown below. Construct frequency polygons of both on a common 
scale of scores. Construct cumulative frequency curves on a common 
scale. Does either graph emphasize features of the distributions not 
emphasized by the other? Explain. 
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SCORE | VAT MAT 
780-809 1 
750-779 2 1 
720-749 3 2 
690-719 6 3 
660-689 7 8 
630-659 12 12 
600-629 8 13 
570-599 15 1 
540—569 14 18 
510-539 16 18 
480-509 24 T 
450-479 14 16 
420—449 7 10 
390—419 6 2 
360-389 1 1 
330-359 242 es 
TOTAL 138 138 


21. The mental ages of groups of sixth-, seventh-, and eighth-grade pupils 
are shown below, those of the eighth-graders being taken from Table 
I, Appendix B. Construct frequency polygons and cumulative fre- 


MENTAL AGES OF SIXTH, SEVENTH, AND EIGHTH 
GRADE PUPILS IN A CITY SCHOOL 


FREQUENCIES 


MENTAL AGES 6th Tth 8th 
90-99 9 

100-109 25 5 1 
110-119 49 26 4 
120-129 64 52 25 
130-139 62 6l 38 
140-149 36 60 62 
150-159 20 39 53 
160-169 12 27 48 
170-179 4 14 27 
180-189 5 6 21 
190-199 3 9 
200-209 4 2 
210-219 И 2 
220-229 1 
TOTAL 286 298 293 
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quency curves on common bases. (The differences between numbers in 
the groups are not great enough to invalidate rough comparisons.) 
State several items of information conyeyed by the graph of the three 
frequency polygons; by the graph of the three cumulative frequency 
curves, 

22. What characteristic of a frequency distribution accounts for an 
J-shaped cumulative frequency curve? Find or invent a distribution 
whose cumulative frequency curve is not /-shaped. 


Errors in the Frequency Distribution 


"There are two kinds of errors likely to be present in a given fre- 
quency distribution. The first, commonly called the grouping error, 
is consequent to the classification of a continuous series; the second 
is due to sampling fluctuations and may be termed the sampling 
error. 

Errors of Grouping. To illustrate errors of grouping, let us re- 
turn to the frequency distribution of Table 2.3. The seven scores 
tabulated in the 420—449 class had original values of 420, 424, 434, 
439, 444, 444, and 445, respectively. Now if we assume that the 
characteristic or representative score of the class is the mid-value 
434.5, the amounts by which the scores differ from 434.5 are errors 
occasioned by grouping. Subtracting 434.5 from each score, we get 
—]4.5, —10.5, —.5, +4.5, +9.5, +9.5, and +10.5. The algebraic 
sum of these errors is +8.5, so here, as is generally the case when a 
class contains several scores, the errors tend to be compensating 
within the class. Grouping errors also tend to be compensating over 
the entire distribution, since the sums in some classes usually are 
negative and in others positive. 

Grouping errors are usually present in the distribution of a con- 
tinuous series. As we shall see later, they affect the value of certain 
statistics computed from grouped data. The point to note here is 
that when we group continuous measures, we assume, in effect, that 
the measures have the values of the mid-points of their respective 
classes, The extent to which this assumption is not satisfied is the 
extent to which errors of grouping are present. 

There is no way to eliminate errors of grouping, although in the 
computation of the standard deviation, to be discussed in Chapter 
ТУ, a correction for the effect of the errors is available. As a rule 
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their effect tends to be lessened by considered choice of the grouping 
scheme. 

Sampling Error; Moving Average. When a sample of relatively 
few measures from a population is taken, the grouped measures 
may show irregularities due to sampling fluctuations which would 
tend to disappear if more measures were added. 


TABLE 2.7 
RESULTS OF REPEATING 36 THROWS OF 2 DICE 4 TIMES 


FREQUENCIES 
SUM OF 
SPOTS lst 36 2nd36 3rd 36 4th 36 Combined Expected 
Throws Throws Throws Throws Distribution Distribution 

12 1 T 2 4 

11 2 3 3 2 10 8 

10 Ц 5 4 0 10 12 

9 3 T 5 3 18 16 

8 9 2 6 4 21 20 

7 5 10 8 2 25 24 

6 8 Z 3 5 18 20 

5 2 4 3 8 17 16 

4 5 0 2 3 10 12 

3 2 2 4 8 8 

2 1 1 5 1 

TOTALS 36 36 36 36 144 144 


In Table 2.7 the results of throwing 2 dice 36 times on 4 occasions 
are recorded. It will be noted that each of the four distributions 
has different sorts of irregularities and that none approximates 
very closely the combined distribution. The latter, although showing 
some irregularity, resembles the expected distribution to a greater 
extent than do any of the four distributions which comprise it. 
Generally speaking, the larger the sample the better it portrays 
the characteristics of the population, since the addition of cases 
tends to eliminate sample irregularities. The tendency is further 
demonstrated in Table 3.6, p. 98. 

It is sometimes desirable to “smooth” an observed distribution, а 
procedure which presumably reduces the effect of sampling fluctu- 
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ations upon frequencies in the classes. Consider the distribution of 
freshman aptitude scores of Table 2.3, now included in Table 2.8. 
If we are interested in the probable effect upon the form of the 


TABLE 2.8 
SMOOTHED FREQUENCY DISTRIBUTION 
OF SCORES OF 138 COLLEGE FRESHMEN 
ON COLLEGE BOARD VERBAL APTITUDE 


TEST 
OBSERVED SMOOTHED 
SCORE FREQUENCY FREQUENCY 

840-869 0 
810-839 0 28 
780-809 1 1.0 
750-779 2 2.0 
720-749 Б] 3.7 
690-719 6 5.3 
660-689 7 8.3 
630-659 12 9.0 
600—629 8 11.7 
570-599 15 12.3 
540-569 14 15.0 
510-539 16 18.0 
480-509 24 18.0 
450-479 14 15.0 
420-449 7 9.0 
390-419 6 4.7 
360-389 1 3.0 
330-359 2 1.0 
300-329 0 -T 
270-299 0 

TOTAL 138 138.0 


distribution of adding scores of freshmen from the same or а com- 
parable population, we may adjust or smooth the observed fre- 
quencies and thereby reduce irregularities. One of the better 

/ methods of smoothing a distribution is that of the moving average. 
The process of taking moving averages by 3's is illustrated in 
Table 2.8. It will be noted that two zero frequency classes are 
appended at the top and bottom of the observed distribution. The 
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smoothed frequency for the 810-839 class is obtained by adding 
the frequencies in the classes just above and just below that class 
and dividing by 3. Thus, (0 + 0 + 1)/3 = .3. The smoothed fre- 
quency for the 780-809 class is (0 + 1 + 2)/3 = 1; that for the 
750-799 class is (1 + 2 + 3)/3 = 2; and so on. The student should 
verify the correctness of the smoothed frequencies shown in Table 
2.8. Since the procedure involves the addition of 1/3 of each fre- 
quency 3 times, the total of the smoothed frequencies is equal to 
the total of observed frequencies or N. (Note: If it is impossible to 
have classes above and below observed classes, as in the distribution 
of Table 2.7, the smoothed frequencies of the two nonexistent classes 
may be combined with the smoothed frequencies of the top and 
bottom classes, respectively; in fact, some writers recommend that 
such combinations be made for all distributions.) 

The smoothed frequencies result from averages taken over three 
intervals. It would be possible to take averages over five intervals 
or any other odd number, but rarely advisable. In fact, only when 
there is good reason for believing the observed irregularities are 
due only to sampling fluctuations should the moving average over 
three intervals be taken. The moving average may obscure signifi- 
cant as well as accidental irregularities, and the greater the number 
of intervals included in the averaging process the greater this 
possibility. 


Exercises 


23. Explain in terms of errors of grouping why it is advisable to adjust 
grouping schemes so that the extreme measures tend to fall near the 
middle of the highest and lowest classes. 

24, Demonstrate a situation in which errors of grouping would be non- 
compensating, even though there were a great many measures in the 
class, 

25. As the grouping interval is made smaller, what will be the effect upon 
the size of the errors of grouping? 

26. In column I of the blank form below, tabulate the 107% of pupils 


number 000, 010, 020,..., 290, Table 1, Appendix В. Іп column 
II, pupils numbered 005, 015,..., 285. In column ІП, pupils 
numbered 008, 018, . . . , 288. Combine the distributions in column 


IV. Which resembles the distribution of the 293 more closely ? Examine 
the effect of a moving average by 3’s upon each of the four distributions. 


68 Statistics in Education 


SCORE ї п m 1v TOTAL 
1 
25 11 
112-118 23 
105-111 31 
98-104 53 
91—97 61 
84—90 64 
77-83 57 
70-76 17 
63-69 . 2 
56-62 2 
49—55 1 


27. Construct а histogram of the smoothed frequency distribution in Table 
2.8 and compare it with the histogram in Fig. 2.7. 


Types of Frequency Distributions 


The great majority of frequency distributions encountered in 
educational measurements have in common an important character- 
istic. This characteristic is idealized in the statement, the more 
extreme a devialion from the average value, the less frequently it appears. 
The frequency polygons of such distributions tend, of course, to 
have single peaks and to slope more or less uniformly downward 
from the peaks. 

The single-peaked or bell-shaped type of distribution was found 
to characterize errors of observation in the physical sciences near 
the beginning of the nineteenth century, but it was not until the 
latter part of the century that the distribution was found to char- 
acterize the measurements of certain economic and social variables. 
The Belgian statistician, Quetelet, appears to have been the first 
person to advance the idea that mass observational data, from 
various sources, tended to be distributed according to the “law of 
error.” The great English scientist, Sir Francis Galton, was so 
impressed by the tendency that he wrote (Ref. 6, p. 66): 


I know of scarcely anything so apt to impress the imagination as the 
wonderful form of cosmic order expressed by the “Law of Frequency of 
Error." The law would have been personified by the Greeks and deified, 
if they had known it. It reigns with serenity and in complete self-efface- 
ment amidst the wildest confusion. The huger the mob and the greater 
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the apparent anarchy, the more perfect is its sway. It is the supreme law 
of Unreason. Whenever a large sample of chaotic elements are taken in 
hand and marshalled in the order of their magnitude, an unsuspected 
and most beautiful form of regularity proves to have been latent all along. 


Both Quetelet and Galton believed that most physical and mental 
variables, when reliably and appropriately measured, would be 
found distributed according to the “normal curve of error," or 
approximately so. 

The Normal Distribution. The normal frequency distribution, 
whose smoothed polygon is the so-called normal curve, is the back- 
bone of statistical theory, and we shall consider its elementary theo- 
retical and practical applications at some length in later chapters. 


TABLE 2.9 


NORMAL DISTRIBUTION OF 400 SCORES 
(From table III, appendix B) 


SCORE FREQUENCY 
68-72 1 
63-67 1 
58-62 1 
53-57 26 
48-52 49 
43-47 69 
38-42 80 
33-37 69 
28-32 49 
23-27 26 
18-22 11 
13-17 4 
8-12 1 


П 


In this section, we wish only to note some of its characteristics 
as a type of distribution which empirical data frequently tend to 
follow. Since such data are rarely, if ever, entirely normal in form, 
the normal curve is a mathematical ideal. 

Let us examine the shape of the frequency polygon of a normal 
distribution. The 400 normally distributed scores of Table III, 
Appendix B, are grouped in Table 2.9. The histogram and polygon 
of the normal distribution of Table 2.9 are shown in Fig. 2.11. 
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If a smooth curve were sketched to fit the polygon in Fig. 2.11 as 
closely as possible, and to approach but not touch the base line, the 
curve would closely resemble the normal curve. The important 
things to note about the shape of the curve are its symmetry and 
its degree of peakedness, both of which are quite similar to those 
of an outline of a typical bell. 


80r 
70r 


Frequency 
98858258588 


Fig. 2.11. Histogram and frequency polygon of a normal distribution. 
(From Table 2.9.) 


As was noted above, empirical data rarely, if ever, yield a truly 
normal distribution. They tend to show systematic departure from 
normal form either in respect to symmetry or to peakedness, or to 
both, although in many instances the departure can be considered 
the result of sampling fluctuations and not, therefore, necessarily in 
contradiction to an assumption of a normally distributed population. 

The Skewed Distribution. The word skewed means “Jacking 
symmetry” or “distorted.” The meaning of skewness as applied 
to frequency distributions can be most clearly brought out by 
illustrations. 

The frequency polygon of Fig. 2.12 is particularly lacking in sym- 
metry on the left side and is considered to have negative skewness. 
The polygon of Fig. 2.13 shows positive skewness. The student will 
find that taking a moving average of the data represented by either 
polygon will not eliminate the marked lack of symmetry. 
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64.5 84.5 104.5 124.5 144. 
74.5 94.5 114.5 134.5 154.5 


Score 
Fig. 2.12. A negatively skewed distribution. (Scores of 47 students on a 


generalization test.) 
14 
12 


Frequency 
© 


on ы о о 


TATU 22-97 327 137. 42 4752 
Seconds 


Fig. 2.13. A positively skewed distribution. (A pupil’s speed of re- 
sponse to 58 mental test items.) 


Distributions showing a systematic departure from normal curve 
symmetry are said to be skewed. In such distributions, the variation 
of the measures is considerably greater near one extreme of the 
scale than the other. 

The Leptokurtic and the Platykurtic Distributions. The 
word kurtosis refers to the relative “ width of shoulders” or “degree 
of peakedness" of a frequency distribution, that of the normal 
distribution being described as mesokurlic (mesos means “middle” 
or *medium"). Relatively high and narrow distributions are de- 
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Fig. 2.14. A leptokurtic distribution. (Length in centimeters of 100 
books selected at random from a library shelf.) 
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sor 


0.5 45 8.5 125 165 205 245 285 
2.5 65 105 145 185 225 265 


Score 


Fig. 2.15. A platykurtie distribution. (Scores of 293 pupils on an 
Arithmetic Problems Test, From Table I, Appendix B.) 


scribed as leplokurlic; relatively flat-topped distributions as platy- 
kurtic. The two types of distributions are illustrated in Fig. 2.14 
and 2.15. Either type, of course, may show skewness as well as 
nonnormal peakedness. 

Since the apparent kurtosis of a frequency polygon is affected by 
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the choice of the dimensions used in its construction, the departure 
of a given frequency distribution from normality with respect to 
peakedness is more difficult to detect by inspection than is skewness. 
In later chapters we shall consider quantitative methods of describ- 
ing skewness and kurtosis and methods of determining whether the 
departure from normality in a given distribution is too great to be 
reasonably ascribed to sampling fluctuations. 

Other Forms of Distributions. In the above paragraphs, we 
have considered only those distributions whose polygons have single 
peaks with sides sloping downward from the peaks, 

Some empirical data show two or more peaks, some suggest J-type 
curves, some U-type curves, and some tend to show little or no 
regularity. In research work, the type of distribution characterizing 
given data is always of major concern. Many of the standard sta- 
tistical techniques taken up in the following pages presuppose 
normality of data and may give erroneous results if the data are 
markedly nonnormal. 

At this point in his study of statistics, the student is encouraged 
to develop an attitude of thoughtful skepticism regarding easy 
and hasty assumptions of normality. At the same time he should 
note the rather remarkable frequency with which collections of data 
tend to be characterized by the normal distribution, 


Exercises 


28. What type of distribution would be likely to characterize age of people 
at time of marriage? 

29. Suppose a 50-item test, of low difficulty were administered to a group 
of 100 students. What type of distribution most likely would charac- 
terize the scores? Suppose the items were of high difficulty? Suppose 
25 of the items were of low difficulty and 25 of high difficulty? 

30. Suppose the amount of pupil or worker tardiness in a school or factory 
were recorded by noting the number of tardies and the minutes of each. 
If the number of tardies were plotted on a vertical scale and minutes 
on the horizontal, what do you believe would be the shape of the 
distribution? 
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Chapter Ш 
Characteristics of Statistical 


Series. Central Tendency 


WE HAVE seen that the first step in the reduction and description 
of a long quantitative series is that of classifying the values in a 
frequency distribution and constructing frequency diagrams, such 
as the histogram and the polygon. 

It is usually the case that statistical work involves a comparison 
of one series with one or more others or with theoretical values, 
such as standardized test norms. For example, when we have in 
hand such series as those shown in Table 3.1, we ordinarily would 
wish to know how the schools compare as to performance on the 
educational test, or how the entire sample of 293 eighth-grade 
pupils compares with eighth-grade pupils of previous years in 
the same city or with pupils in other cities, or with eighth-grade 
pupils at large, as given by state or regional norms for the test. 
The questions we attempt to answer by statistical methods usually 
involve comparisons of two or more series. 

When the series to be compared are classified in frequency dis- 
tributions or are graphically depicted in histograms or frequency 
curves, points of similarity and difference may be noted roughly by 
inspection. If we inspect the distributions of Table 3.1 and the 
polygons of Fig. 3.1, we note that the distributions of Schools E 
and F seem to represent the best performances on the test and the 
distribution of School G perhaps the poorest. But we further note 
both that there is considerable overlap of the distributions and that 
each distribution has several unique features. Comparisons by: 
inspection tend to be inexact and inconclusive, and it is difficult 


to obtain agreement regarding their meaning. As a rule, quanti- 
75 
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tative methods of comparing frequency distributions are more satis- 
factory than graphical methods. Let us consider the four important 
ways in which frequency distributions may differ. 

The Four Major Characteristics of the Quantitative Series. 
Quantitative series may differ in one or more of four important 


TABLE 3.1 


FREQUENCY DISTRIBUTIONS OF ARITHMETIC FUNDAMENTALS 
TEST SCORES 
(Data from table I, appendix B) 


SCHOOL ALL 

и. ЭБ Газ ру АЕ о ты, 
51-53 1 1 
48-50 тт 1 3 
45-47 Ty 10) З 0 5 
42-44 О OWN Тас shoe Ж 19 
39-41 1 ОРТ ТЕЕ т 18 
36-38 | 2 ЗЭ тА рро БУА 31 
БӨБ р, 107. Т! сла Bec ngo dr der peces, 87 
505ге A or. О S" И а 49 
РО ООЗУ ИГО мою а 4l 
РЯ О ш MEAE TA ое а 31 
21-23 Е О 2]: 0.97 s 19 
а erede c LI Тәл Love ead 16 
15-9 4 4 1 12 
TE 2 1 4 
2 1 0 3 
0 1 1 
х 3 3 
NUMBER 29 18-738 37 357 385 3720 177 20 293 

кыш ыш ы ка есен AA АЕА 


respects: (1) average value of the items or central tendency, (2) the 
scatter of the items about the average value or variability, (3) 
degree of asymmetry in the scatter of the items or skewness, and 
(4) the extent of scatter of the items in the neighborhood of the 
average value or kurlosis. In practical work, two or more comparable 
series ordinarily differ to some extent in all four respects, if only 
because of sampling fluctuations. Various combinations of differ- 
ences between series may be seen in Fig. 3.1, 3.2, 3.3, and 3.4. In 
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Fig. 3.1. Frequency polygons of distributions 4 (—), B (++), E.C--); 
and С (——) from Table 3.1. (Class frequencies аге expressed as relative 


frequencies or proportions.) 
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Fig. 3.2. Polygons of two fre- 
quency distributions having simi- 
lar variability, skewness, and 
kurtosis, but unequal means. 


Frequency 


Score 


Fig. 3.3. Polygons of two sym- 
metrical frequency distributions 
having equal means but dis- 
similar variability and kurtosis. 
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the last three figures minor irregularities have been smoothed out 
so that major differences stand out clearly. 

In order to compare two or more series exactly, we need measures 
of their four major characteristics. There are situations in which 
variability, skewness, and kurtosis, particularly if the series are 
short, are not considered; and, at the other extreme, situations in 
which significant detail would be lost if we attempted to describe 
series only in terms of their four major characteristics. In the former 
situations, the implicit assump- 
tion is that the variability, skew- 
ness, and kurtosis of the series 
are not so different as to invali- 
date the comparisons; in the lat- 
ter, it may be doubted whether 
exact comparisons are desirable. 
Ordinarily, we may consider 
measures of central tendency, 
variability, skewness, and kur- 
tosis necessary and sufficient in 
describing and comparing quan- 
titative series. These measures, 
in addition to providing the basis 
for exact comparisons of two series, are indispensable in the analy- 
sis and interpretation of a single series. 

Measures of Central Tendency. It was seen in the previous 
chapter that the items in a quantitative series, classified in a fre- 
quency distribution, tend to cluster about a point somewhere 
between the extremes, and the tendency again is seen in the table 
and figures above. This tendency commonly is referred to as the 
central tendency of the series, and the point about which the items 
tend to cluster is called a measure of central tendency. 

A measure of central tendency is a sort of average or typical 
value of the items in the series, and its function is to summarize the 
series in terms of this average value. It is a denominate quantity, 
being expressed in the same unit as the items. 

Since the central tendency of observational data tends to be 
relatively stable from sample to sample taken from the same popu- 
lation, a measure of central tendency has more than summarizing 


Frequency 


Score 


Fig. 3.4. Polygons of two fre- 
quency distributions having equal 
means, variability, and kurtosis, 
but dissimilar skewness. 
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and descriptive usefulness. As will be seen їп Chapter VIII, measures 
of central tendency are frequently used in determining whether two 
samples differ so greatly as to discredit the hypothesis that they 
belong to the same population. 

There are five measures of central tendency in common use: mode, . 
median, arithmetic mean, geometric mean, and harmonic mean. 
Since each of these measures or "averages" involves somewhat 
different, techniques and interpretation, each will be treated sepa- 
rately in the following pages. As he learns about each, the student 
should keep in mind that no single number can adequately describe 
a statistical series, and that consequently two or more series cannot, 
be fairly compared on the basis of average values alone. In the past, 
educational workers in particular have tended to be preoccupied 
with average values to the exclusion of the other important features 


of their data. 
The Mode 


The mode may be defined as the item which occurs most fre- м 
quently in a statistical series. When a particular type of wearing 
apparel, such as a suit or dress, is worn more frequently than other 
Lypes during the fall season, that particular type of apparel is re- 
ferred to as the mode for the fall season. If the majority of college 
graduates enter professional careers, the modal occupation of college 
graduates is professional. If we have the ten scores, 24, 22, 20, 30, 
29, 22, 26, 28, 22, 25, the modal score is 22, since 22 is the score 
appearing most often. In the series 20, 21, 24, 25, there obviously <, 
is no modal score. The abbreviation commonly used for the mode 
is Mo. 

The Mode in a Frequency Distribution. The mode* in a fre- 
quency distribution is considered to be at the mid-point of the class 
interval containing the greatest number of cases. In Table 3.1, the 
mode for the distribution of scores in School D is 25, the mid-point 
of the interval 24—26, which contains the greatest number of scores. 
The mode for the School F distribution is 43. The School Æ dis- 
tribution does not have a single mode, but two modal values, one 


м 


* The mode as defined here is sometimes called the crude or empirical mode 
to distinguish it from the mathematical mode. In statistical theory, the latter | 
is defined as the abscissa corresponding to the highest point of a theoretical ~ 
frequency curve. | 
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at 37, the other at 43. We shall consider distributions having no 
single mode under the next heading. 

The “ All-Schools" distribution in Table 3.1 has a marked mode 
at 31. The distribution of mental ages shown in Fig. 3.5 has а 
marked mode at 144.5. When a distribution comprising a sub- 
stantial number of cases (say more than about 30) shows a single 
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Mental age 


Fig. 3.5. Distribution of mental ages of 293 pupils. (From Table I, 
Appendix B.) 


mode or one prominent mode among several minor modes, the 
mode usually is a useful, though somewhat crude, measure of 
central tendency. The student may well ask how he can tell whether 
a mode is minor. Although no general answer can be given, it is 
usually safe to conclude that a mode is minor if a slight change of 
grouping scheme or a single moving average will eliminate it. 
Distributions Having No Single Mode. It was noted above 
that a distribution may have more than one mode. If two different 
intervals in a distribution each contains a larger number of scores 
than are found in adjacent intervals, the mid-point of each of these 
intervals is referred to as a mode, and the distribution is called 
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bimodal. If there аге more than two modes, the distribution is 
described as mullimodal. 

The distribution shown in Fig. 3.6 appears to have several modes, 
as does the distribution of Fig. 3.7. None of the modes are prominent 
enough, however, to support any conclusions about single modal 
values. Such distributions possess no marked mode, and, when 
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Score 
Fig. 3.6. Distribution of Arithmetic Fundamentals Test scores. 
(School C, Table 3.1.) 


dealing with them, we cannot use modal values either in description 
or in making inferences about the central tendency in the population 
they represent, Such distributions may result from sampling fluctua- 
tions or peculiar population distributions or from the grouping 
scheme employed. The student will find that short series tend to 
have several modal values, and that, as a rule, the mode as a measure 
of the central tendency of a short series has little usefulness. 

There are situations involving distributions having no single 
mode, however, in which the concept of modal values may be of 
great importance. Consider the data summarized in Fig. 3.8, which 
were obtained in a study of pupil transportation costs in 54 dis- 
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Fig. 3.7. Distribution of results of throwing two dice 36 times. 
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Fig. 3.8. Transportation costs per bus mile in 54 school districts. 
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tricts of comparable size in a western state in 1947. Analysis of the 
data showed that the factor underlying the bimodality was bus 
ownership. The modal cost of operating district-owned buses was 
22.5 cents per bus mile, while the modal cost of private-contract 
buses was 28.5 cents per bus mile. 
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5соге 
Fig. 3.9. Scores of 153 graduate students on а mathematics back- 
ground test. 


As another example of important bimodality, the scores of 153 
graduate students on a mathematics background test are shown 
in Fig. 3.9. The students who had majored in mathematics or 
science as undergraduates had an average score of about 50 in the 
test, while those majoring in other fields had an average score of 
about 20. 

Any single measure of central tendency would obscure note- 
worthy features of the distributions of Fig. 3.8 and 3.9. The fact 
that modes occur at widely different values is important in inter- 
preting and describing the given data. Comparisons of transporta- 
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tion costs are fairer and more useful if the factor of bus ownership is 
taken into account; a student's score on the mathematics back- 
ground test is more meaningful if we know whether the student 
majored in mathematics or science. 

It may be said, somewhat paradoxically, that the concept of mode 
ің indispensable when the central tendency of a distribution which 
has no single mode is being studied. The presence of two or more 
modal values in a distribution casts doubt upon the homogeneity 
of the data as classified. As we have seen, the basis for quantitative 
classification and comparison is size, the data being considered to 
be alike in kind or quality. The presence of two or more modes 
suggests that important qualitative differences may underlie the 
data and that more exact and useful comparisons may be possible 
if the differences are taken into account. Unfortunately, it is not 
always possible to locate attributes underlying bimodality, but 
careful and extended analysis of data in which two or more modes 
appear should always be made. 

Uses of the Mode. The mode, which indicates the item of 
greatest frequency іп a series, is an average widely used in everyday 
life. When newspapers use the term “оп the average" they are 
usually referring to an outstanding or conspicuous tendency, and 
not to an arithmetic average. The mode is an average which is easy 
to understand, easy to determine, and one which best depicts the 
typical size of the items in a series. The fact that the mode can be 
determined by inspection favors its use as a rough index of the 
central tendency of the frequency distribution. 

The mode is not affected by extreme scores, and we do not need 
to know the extreme scores in a series to determine the modal score. 
For example, if the modal salary of college presidents is desired, we 
need not know the highest and lowest salaries paid to college presi- 
dents to determine the modal salary for the group. 

The mode of a distribution, more than other averages, is affected 
by changes in the grouping scheme, and is subject to wide fluctu- 
ations from one sample to the next. Tt is not at all reliable in small 
samples. The mode suffers the additional disadvantage of being 
incapable of algebraic treatment. For example, the mode of one 
distribution cannot be combined with the mode of another to deter- 
mine the mode of the combined distributions. When the number of 
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cases in a distribution is small, or in situations calling for fine 
grouping, the mode may have little meaning. If the modal salary 
of 50 workers is $3,000, for example, but if 45 of the 50 receive 
salaries different from $3,000, it would be absurd to report a mode. 

The concept of mode as a measure of central tendency is chiefly 
useful because it encourages attention to bimodal and multimodal 
data and inyites questions regarding the conditions giving rise to 
such data. 


Exercises 


1. Describe a specific situation in which the mode may be useful. 

2. The data in the distribution below were obtained in the administration 
of a Wechsler-Bellevue arithmetic reasoning problem to 65 adults, 54 
of whom worked the problem incorrectly. The distribution shows that 
5 of the 54 spent from 11 to 21 seconds on the problem, 15 spent from 
22 to 32 seconds, and so on. Do the data suggest anything about the 
nature of errors or the nature of people making errors on the problem? 
How would you study the matter further? 


TIME IN SECONDS FREQUENCY 


5 
15 


3. Examine the effect of a moving average оп one of the multimodal dis- 

tributions of Table 3.1. 

Examine the effect upon modal value(s) of a change in the grouping 

scheme for one of the distributions in Table 3.1. (The ungrouped scores 

are given in Table I, Appendix B.) 

5. Suppose that a group of 50 high school boys and 50 high school girls 
were given a strength of grip test, or a throwing test, or a rope climbing 
test. How do you believe the results would be distributed? Ilustrate by 


4. 


a frequency diagram. 
6. It has been said that when a research worker finds markedly bimodal 


data he should attempt to discover the factor responsible for the bi- 
modality and then to classify the original group into two groups on the 
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basis of the factor for further study. Suppose that an investigator finds 
the costs per pupil as reported by the schools in a state to be markedly 
bimodal. Can you think of a possible factor underlying the bimodality > 
Might more exact study of school costs be possible if the schools were 
divided into two groups on the basis of the factor? 


The Median 


A second measure of the central tendency of a statistical series is 
known as the median. By definition, the median is that point on the 
scale of scores below which one-half of the scores lie and above which 
one-half of the scores lie. Hence, the median, by virtue of its middle 
position, characterizes the central tendency of a series. 

When a set of scores are ungrouped, the middle score or mid- 
measure ordinarily is taken as the median. If, for example, we have 
the five scores 8, 12, 15, 16, 19 arranged in order of size, the mid- 
measure is 15. If we have an even number of ungrouped scores 
arranged in order of size, the mid-measure is customarily defined 
as the point halfway between the two middle scores. Thus, in the 
series 6, 8, 10, 13, 14, 16, the mid-measure is (10 + 13)/2 or 11.5. 

Computation of Median of Grouped Data. A graphical illus- 
tration will serve to clarify the computation of the median of 
grouped data. As was seen in Chapter П, the area of the histogram 
is proportional to the number of scores in a distribution. Since the 
median by definition is the point on the scale of scores below (or 
above) which one-half of the scores lie, a vertical line drawn through 
the median will bisect the histogram, and conversely, the vertical 
line which bisects the histogram will pass through the median 
With this in mind let us examine the histogram of Fig. 3.10. 

The line which bisects the histogram of 37 scores must pass 
through a point on the scale below which 18.5 scores lie. Since 14 
scores lie below the 26.5-29.5 class, the line will have to mark off 
an area equivalent to 4.5 scores in that class in order to mark off 
18.5 scores in all. If we assume that the 6 scores in the 26.5-29.5 
class are distributed evenly over the interval (a necessary assump- 
tion in computing the median), the location of the line is easily 
determined. 'The 6 scores are distributed over an interval of 3 units 
(from 26.5-29.5). Hence, each score corresponds to 3/6 unit and 
the needed 4.5 scores correspond to 4.5 X 3/6 or 2.25 units. When 


Characteristics of Statistical Series. Central Tendency 87 


we add 2.25 to 26.5, we obtain 28.75 as the value of the point below 
which 18.5 scores lie. Hence, 28.75 is the median, or the point 
below which one-half of the scores in the distribution lie. 


8r 


g Median = 28.75 


Frequency 


17.5 205 23,5 265 29.5 325 355 38.5 41.5 445 47.5 50,5 
Scale of scores 


Fig. 3.10, The median in a histogram. (School D, Table 3.1.) 


It is not necessary, of course, to construct a histogram in order 
to determine the median of a frequency distribution, We need only 


lo: 
a. Determine the number of scores or cases below the class in which 


the median falls. 
b. Subtract this number from N/2. 
c. Divide the difference by the number of cases in the class con- 
taining the median. 
. Multiply the quotient by the class interval. 
. Add the product to the lower real limit of the class in which the 


median falls. 


"This is exactly the procedure which was followed in computing the 
median of the distribution represented by the histogram in Fig. 


a 
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3.10. The procedure may be summarized in the formula 
Man = Leb (Ка =) б (3.1) 


in which the abbreviation Mdn indicates the median; L is the lower 
real limit of the class containing the median; № is, as always, the 
total number of cases in the distribution; F is the total number of 
cases in the classes below the class containing the median; / is 
the number of cases in the class containing the median; and i is the 
class interval, 

In finding the median when N is large, it is helpful first to set 
up a cumulative frequency (cf) column, as illustrated in Table 3.2, 
at least up to the class in which the median falls. If the column is 
completed, a check, cf = N, on addition throughout is provided. 


TABLE 3.2 


COMPUTATION OF MEDIAN OF GROUPED DATA 
BY FORMULA 
(Data from table 3.1) 


COMPUTATION OF MEDIAN USING FORMULA 
(3.1) 
s АРЧЕР 


М = 293 М/2 = 146.5 
Since median falls in 30-32 class, 


L = 295 
Е-10 f=49 гыз 


Mdn = 29.5 + =) 3 


= 29.5 4- 1.01 


= 30.51 
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The application of the formula for the median is illustrated in 
Table 3.2. The student is advised to study the graphical illustration 
of computing the median and to rely upon understanding rather 
than upon the formula in computing medians. 

Occasionally the student will come across a frequency distribution 
in which the median falls exactly on a real class limit. If an adjacent 
class contains zero frequency, as illustrated below, 


scone fi 
45-49 2 
40-44 5 
35-39 6 
30-34 0 
25-29 8 
20-24 3 
15-19 2 


the median is usually considered to be the mid-point of the class 
interval having zero frequency, In the illustration the median is 
considered to be 32,0 rather than 29.5, If two adjacent classes have 
zero frequency, the median may be considered to be the real limit 
between the two zero-frequency classes, although in the latter 
situation, at least, there would be present presumptive evidence of 
bimodality, and consequently doubt regarding the fairness of a 
single measure of central tendency. 

The median of a frequency distribution of discrete data is com- 
puted exactly as the median of a frequency distribution of соп- 
tinuous data. 

Uses of the Median. The median of a distribution is the point 
below (or above) which one-half of the values lie, Since the median 
of the distribution in Table 3.2 is 30.51, we know that the number 
of pupils having seores below 30.51 is equal to the number of 
pupils having scores above that point. A score below 30.51 is 
“below average” in the sense of being one of the scores in the lower 
half of the distribution. 

In general, the median is easily understood and has several advan- 
tages as an average. When a series contains either a few extremely 
high or a few extremely low scores, relative to the majority of scores 
in the series, the median is perhaps the most representative average 
available, for it is not affected by extreme scores. When averages 
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of such data as salaries, costs of homes, days lost by workers because 
of illness, and ages of people at time of marriage are needed, the 
median generally is to be preferred. When the central tendency of 
an open-end distribution, 1.е., a distribution having a bottom or top 
interval of unspecified length, is desired, the median is the most 
reliable measure that can be computed, 

The median, like the mode, is a nonalgebraic measure, and 
medians of separate distributions cannot be combined to give the 
median of the combined distribution. It has the further disadvan- 
tage of being less dependable than the arithmetic mean, a point 
which will be discussed in the next section. 


Exercises 


. Compute the median of one or more of the distributions in Table 3.1. 
8. Using one of the distributions in Table 3.1, show that the median can 
be computed, working from the top of the distribution instead of from 
the bottom. 
What are the mid-measures of the School A and School B distributions 
of Table 3.1? (The ungrouped scores are given in Table I, Appendix B.) 
10. Combine distributions А and B of Table 3.1 and find the median of the 
combined distribution. Could this median be obtained from the medians 
of the separate distributions? 
11. The median IQ's for two sections of high school freshmen are 105 and 
110, respectively. Can we conclude that the median IQ of the total 
group is 107.5? Under what conditions would the conclusion be correct? 
The distribution of the salaries of teachers and administrators in а 
small town is shown below. What is the median salary? How much is 
the median changed if the three top salaries are not considered? 


25 


е 


N 


SALARY 2 
$7,000-$7,499 1 
6,500- 6,999 0 
6,000— 6,499 2 
5,500— 5,999 0 
5,000 5,499 0 
4,500- 4,999 6 
4,000- 4,499 6 
3,500— 3,999 15 ы 
3,000— 3,499 10 
2,500— 2,999 12 
2,000— 2,499 4 


Characteristics of Statistical Series. Central Tendency 91 


13. Compute the medians of the two distributions below. 


ТА SCORE е 
Б] 16-18 12, 
6 13-15 17 
13 10-12 0 
17 7-9 15 
5 4-6 14 


The Arithmetic Mean 


Іп many cases when we wish to find the average of a set of scores, 
we simply divide the sum of the scores by the number of scores in 
the set. The result is popularly called the “average”; however, in 
statistics it is designated arithmetic mean in order to distinguish it 
from other averages. In discourse, the term arilhmelie mean usually 
is shortened to mean. 

The arithmetic mean is defined as the sum of the values in a 
series divided by the number. Using Xi Xs Ль..., Ху to 
represent, the values of the respective № items in a series, the defi- 
nition may be written 


X,+ Ха + Xs + ай ын Ху, 


ie N E 

The definition may be stated more simply 
M УХ 2 
= 5 (3.2) 


in which ХХ” is the sum and № the number of the items in the 
series. The symbol Х always refers to sum in statistics. 
The mean of a short series is easily determined, For example, the 


*'The items in a quantitative series commonly are designated X, it being 
understood that the numerical value of X may yary from item to item. The 
expression XX indicates summation or addition of all of the items. The symbols 
M and X are used interchangeably to indicate the arithmetic mean. If the 
more precise notation X; is used, the formula for the mean is written 


which states explicitly that all of the items in the series are summed. 
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sum of the 18 Arithmetic Fundamentals Test scores, School B, 
Table I, Appendix B, is 495. Since ХХ = 495 апа N = 18, the 
mean X or M, of the scores is 


495 
М = 18 = 27.5. 


Although the mean is а simple and easily understood measure of 
central tendency, its calculation is laborious when a series is long. 
Hence, we need to consider short-cut methods of computing the 
mean. 

Short-Cut Methods of Finding the Mean. The Arithmetic 
Fundamentals Test scores of the 38 pupils of School С, Table I, 
Appendix B, are listed below, the number of pupils having each 
score being shown in the f column. The entries in the /Х column 
are the products of the respective scores multiplied by the number 


sconn Xf IX scone X f IX 
46 1 46 24 3 72 
38 1 38 23 3 69 
37 2 74 22 1 22 
35 2 70 21 2 42 
34 3 102 20 1 20 
33 2 66 19 2 38 
31 4 124 18 1 18 
29 1 29 17 1 17 
28 2 56 16 1 16 
27 1 27 10 2 20 
5 1 26 sum 38 1,017 
25 1 25 


of times they occur. Hence, the sum of the scores is 1,017, and the 
mean is 1,017/38 or 26.76. Formula (3.2), when applied as in the 
present illustration, may be written 


M = xx . (3.3) 


Now consider the 38 scores as grouped in Table 3.3. The mid- 
points of the class intervals are designated X". The entries in 
the fX’ column are products of the mid-points times class fre- 
quencies, Under the assumption that the scores have the mid-values 
of their respective classes, we may find the mean by summing the 
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JX’ column and dividing by 38, as shown at the foot of Table 3,3. 
This procedure may be described by the formula, analogous to (3.3), 
_ ах 
М “КГ? (3.4) 
in which f is the frequency in any class whose midpoint ін А7, 


TABLE 3.3 
COMPUTATION OF MEAN OF GROUPED DATA 
m 
MIDVALUE OF 


scons X cass X’ /. fx’ 
ume m c 
45-47 46 1 46 
42-44 43 0 
39-41 40 o 
36-38 7 з 11 
33-35 Мм 7 238 
30-32 з 4 121 
21-: 28 4 12 
24-26 25 5 125 
21-23 22 6 132 
18-20 19 J 76 
15-17 16 2 32 
12-14 13 0 
9-11 10 2 20 
NUM E] 1,016 
EE 
M eI = 614 


It will be noted that the mean of the ungrouped scores is 26.16, 
while the mean of the scores as grouped in Table 3.3 is 26,74. The 
difference is due to errors of grouping (see pp. 64-65). Some of | 
the scores lie above the mid-points of their classes and some below, 
But the discrepancies tend to be compensating, both within classes 
and over the entire distribution, and the mean of grouped scores 
ordinarily is a close approximation to the exact mean, although not 
usually quite as close as in the illustration. 

A further reduction of the labor involved in finding the mean is 
possible by the method systematized in Tables 3.4 and 3.5. The 
method consists essentially in selecting the mid-point of a class 
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interval as a reference point or arbitrary origin and expressing 
the deviations of the mid-points of the other class intervals from 
the arbitrary origin in unit steps d, as illustrated in the tables. 

Again we must assume that the scores in a class have the mid- 
value of that class. When the class mid-points are equispaced, 
a particular class mid-point X’ (or a score in the class) may be 
expressed as 


X' = AO + di, 


AO being the arbitrary origin, d the deviation of the class from AO 
and г the class interval. In Table 3.4, for example, in which 10 is 
the arbitrary origin, 22 is the mid-point of the 21-23 class and 22 is 
equal to 10 + (+4)3. In Table 3.5, in which 28 is the arbitrary 
origin, the mid-point of the 21-23 class is equal to 28 + (—2)3, 


TABLE 3.4 
COMPUTATION OF MEAN OF 
GROUPED DATA WITH ARBITRARY 
ORIGIN AT MID-POINT OF 
LOWEST CLASS 

(Data from table 3.3) 


—————————D 


SCORE $ d fd 
45-47 1 12 12 
42-44 0 11 
39-41 0 10 
36-38 3 9 27 
33-35 7 8 56 
30-32 4 T 28 
27-29 4 6 24 
24-26 5 5 25 
21-23 6 4 24 
18-20 4 3 12 
15-17 2 2 4 
12—14 0 1 

9—11 2 0 

SUM 38 212 


Computation of mean by formula (3.5): 
АО = 10, i= 3, fd = 212, М = 38 
М = 10 + (212/38)3 = 26.74 
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and so on. The student can satisfy himself that the relationship 
X' = AO + di holds throughout the tables. 
When we substitute this expression for X' in formula (3.4), we 
have 
Ef(AO 4- di) 
M = NAYARIT RA 


from which, since the constant AO is summed N times and divided 
by №, we obtain 


d 29) ; 
m= 40+ (35 i. (3.5) 
The application of the formula is illustrated in the spaces below 
Tables 3.4 and 3.5. 


TABLE 3.5 
COMPUTATION OF MEAN OF 
GROUPED DATA WITH ARBITRARY 
ORIGIN AT MID-POINT OF 
INTERMEDIATE CLASS 
(Data from table 3.3) 


SCORE f d Ја 
45-47 1 6 6 
42-44 0 5 
39-41 0 4 
36-38 3 3 9 
33-35 7 2 14 
30-32 4 1 4 
27-29 4 0 
24—26 5 =1 5, 
21—23 6 —2 —12 
18—20 4 —3 —12 
18-17 2 —4 —8 
12-14 0 дай 
9-11 2 =i —12 
SUM 38 —16 


Computation of mean by formula (3.5): 
AO = 28, i= 3, Zfd = —16, М = 38 
М = 28 + (—16/38)3 = 26.74 
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The last described short-cut method of finding the arithmetic 
mean is sometimes called the coded method, since the d units really 
are coded values of the mid-points of the class intervals. When the 
class intervals of a frequency distribution are equal, the mean of 
the distribution may be found by the coded method with a minimum 
of work and a minimum of danger of computational error. Let us 
summarize the procedure in the following set of directions: 


a. After a frequency distribution having equal class intervals is 
set up, select an arbitrary origin at the mid-point of a class inter- 
val and code this class “0” in a d column. 

b. Code the next higher class “1” in the d column, the second 
higher “2,” and so on. Code the next lower class " —1," the 
second lower “ —2,” and so on. Ве sure that the d values are 
increasing in the same direction as the class mid-points. 

c. Multiply the d values by their respective frequencies f and enter 
the products in an fd column. 

d. Find fd, the algebraic sum of the fd column, divide 5/0 by N, 
and multiply the quotient by i, the class interval. 

e. Add the result of step (d) to the mid-point of the class interval 
coded “0” to obtain the mean of the distribution. 


Uses of the Arithmetic Mean. The arithmetic mean is the 
most widely used measure of central tendency. Although usually 
somewhat more difficult to compute than the mode or median, its 
definition and meaning are easily understood. The mean. perhaps 
best conveys the idea of average value, since it is derived from the 
exact values of the items in the series. 

The fact that the mean is based upon the sum of the values in a 
series enhances its usefulness in some situations. If we have a set of 
independent observations of the same thing, e.g., the ratings of 
several judges of an individual or a set of measurements of a dimen- 
sion or property of an object, the mean is extremely useful. When 
it can be shown, as is often the case, that the errors in a set of obser- 
vations tend to be compensating, the mean of the observations is 
relatively unbiased. 

But in other situations the fact that the mean is affected by the 
value of ds item works against its fairness as a measure of central 
tendency. As was previously noted, when a series includes a few 
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items of either high or low values, relative to the values of the 
majority, the mean is not a fair measure of central tendency. If, 
for example, the yearly incomes of six lawyers in a small town were 
$25,000, $6,000, $6,000, $5,000, $5,000 and $4,000, we would not 
ordinarily be satisfied with the mean $8,500 as representative of 
the average salary of lawyers in the town. To liken the prospects 
of practicing law in that town with the prospects of practicing law 
in a town in which all lawyers earned between $8,000 and $9,000 
yearly would be grossly misleading. 

The mean is applicable to series of any length, from two values 
upward. It is an algebraic function of the values, and this property 
adds enormously to its usefulness in statistical work. If we have 
two or more subgroups, the mean of the total group can readily be 
determined from the numbers and means of the subgroups. In 
ехг. 23, the student is asked to prove the formula 


Ж.М. + №М2 
SNIN (3.6) 


in which М, is the mean of the total group, N, and М, are the 
number and mean respectively of the values of one group, and № 
and М» are those of the other. The formula can easily be extended 
to more than two series. The mode and median of combined series 
cannot be determined from those of the separate series. 

The mean can be used to obtain an average value of a series after 
each item is weighted. For example, suppose we wish to combine 
the three test scores 60, 72, and 85 of a student into a composite 
score, and suppose we wish the last score to count four times as 
much аз the first and the second to count twice as much as the 
first. The weighted mean is 

(60 X 1) + (72 X 2) + (85 X 4) _ 
Mug Пре 77.7. 


М, = 


The unweighted mean of the scores is (60 + 72 + 85)/3 or 72.3. 
There are a great many situations in which a weighted arithmetic 
mean may be useful. Whenever a composite or “average” must be 
derived from values of unequal importance or of unequal reliability, 
the weighted mean is appropriate, provided a system о weights 
can be worked out and justified. 
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In estimating from a sample of scores the point of central tend- 
ency in a population, the mean tends to be more dependable than 
the mode or median. In statistical theory, it can be shown that the 
means of samples drawn from a normal population differ less among 
themselves and less from the “true” point of central tendency than 


TABLE 3.6 


MEANS, MEDIANS, AND MODES OF 10 RANDOM SAMPLES 
DRAWN FROM A NORMAL POPULATION 


SAMPLE DISTRIBUTION 


SCORE 1 2 3 4 5 6 T 8 9 10 
65-69 1 1 1 
60-64 1 1 1 3 3 1 0 0 
55-59 2 4 3 3 2 1 1 3 0 3 
50-54 3 T 6 1 1 6 tf 9 1 8 
45—49 10 8 1 10 8 6 5 6 9 6 
40-44 10 TÉ 8 8 7 13 11 8 1 10 
35-39 7 5 10 13 13 12 1 T 9 T 
30-34 7 7 1 6 11 4 7 T 10 9 
25—29 5 5 6 3 2 2 6 6 6 5 
20-24 2 1 2 5 Б] 1 4 3 1 0 
15-19 2 2 2 1 1 1 m 
10-14 1 1 1 
MEAN [39.70 40.60 10.80 39.10 39.50 41.20 38.60 40.70 37.10 40.00 
MEDIAN |40.50 41.77 42.00 38.73 37.96 41.04 39.50 40.75 37.83 40.50 
MODE 44.5 42 47 37 37 42 42 52 42 42 


the modes ог medians. It follows that, when we make inferences 
from sample evidence about the central tendency in a population, 
we shall run less risk of error if we use the sample mean as the esti- 
mate of central tendency. Let us illustrate this important point by 
examining several samples drawn from the normally distributed 
scores in Table Ш, Appendix В. 

The distributions of 10 samples of 50 scores each are shown in 
Table 3.6. The samples were drawn as follows: in the set of random 
numbers, p. 568, a beginning point was taken at row 34, columns 
3-4-5, so that the first random number read was 251, the second 
000, the third 208, and so on, reading upward and skipping numbers 
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greater than 399. The first 3 scores of the first sample were, conse- 
quently, 41, 30, and 49, since these are the scores in Table III corre- 
sponding to the random numbers 251, 000, and 208. When the 
top of columns 3-4-5 was reached, columns 4-5-6 were read down- 
ward so that the scores numbered 041, 287, and so on were selected. 
When the bottom of columns 4-5-6 was reached, columns 5-6-7 
were read upward. This procedure was followed throughout the set 
of random numbers until 10 samples of 50 scores each were drawn, 
each score being used as many times as its number was encountered. 
The first 50 scores drawn comprised the first sample, the second 50 
the second sample and so on. 

The means, the medians, and the modes in the 10 samples are 
shown in Table 3.6. The mean, median, and mode of the 400 scores 
in Table IIL are each 40. It will be noted that the mode is not a 
good estimate of the point of central tendency, which for this 
population is 40. Even if we had utilized a grouping scheme which 
would make 40 a class mid-point, the mode would have fluctuated 
more than the mean and median. (In a real sampling problem, of 
course, we do not know the point of central tendency in the popu- 
lation and hence do not know whether or not the grouping scheme 
for the sample tends to distort the mode.) The mode, as has been 
pointed out previously, is quite susceptible both to sampling fluctu- 
ations and to changes in grouping schemes. 

The mean is superior to the median as an estimate of the point 
of central tendency in the population of 400 scores, 1.е., the mean 
is closer to 40, in seven of the ten samples, and inferior in three. 
The deviations of the ten means, disregarding signs, from the true 
value 40 are .30, .60, .80, .90, .50, 1.20, 1.40, .70, 2.60, and .00, with 
an average of .90; the deviations of the ten medians are .50, 1.77, 
2.00, 1.27, 2.04, 1.04, .50, .75, 2.17, and .50, with an average value 
of 1.25. The comparisons are shown graphically in Fig. 3.11. 

In the illustrative example, the mean is clearly the more trust- 
worthy estimate of central tendency in the sampled population. 
If we wanted to estimate the central tendency of the population 
from a single sample, as is ordinarily the case in the practical situ- 
ation, we could have somewhat more confidence in the mean than 
the median or mode as an estimate. 

Although the greater reliability of the mean as compared to the 
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mode and median can be demonstrated analytically only for 
samples from a normal population, empirical studies indicate that 
the mean maintains this superiority as a measure of central tend- 
ency in the majority of the unimodal distributions encountered in 
educational research. 


Sample values 


калатын 1-46 сүз гылт | 


1 2 3 4 5 6 7 8 И 
$атр!е 


Fig. 3.11. Sampling fluctuations of the mean (—) and the median 
(---). (From Table 3.6.) 


Because of its reliability and its algebraic properties, the mean is 
the most widely used measure of central tendency. The mean ordi- 
narily permits more exact comparison of two or more series than 
the mode or median, and provides estimates of central tendency 
of greater utility. The student will find as he proceeds with his 
study that much of statistical method involves the mean. It is 
generally a good policy to use the mean as the measure of central 
tendency of a statistical series, unless there is special reason for 
using the mode or median, or one of the two measures to be described 
in the following sections. 


Exercises 


14. Compute the means of the distributions of Table 3.1. For which of the 
distributions do you believe modes should be reported? For which do 


РРР ee 
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15. 


16. 


D 


18. 


19. 


20. 


Mathematics 36 
Science 12 
Science 12 
English 12 
History 10 
Language 10 


23. 


you believe the median is a better measure of central tendency than 
the mean? 

Take several random samples of 50 scores each from the 400 normally 
distributed scores of Table III, Appendix B. Find the modes, medians, 
and means of the samples and compare with those of the samples given 
in Table 3.6. 

Interpret the fact that while the modal and median pupil chronological 
age for any elementary school grade have changed little during the 
past 25 years, there has been a decrease in the mean chronological age. 
Find the mean of the distribution of salaries given in exr. 12. What is 
the mean if the three top salaries are not considered? 

An object is weighed five times on a laboratory balance, and the read- 
ings are 2.66 gm., 2.60 gm., 2.64 gm., 2.64 gm., and 2.61 gm. Under 
what conditions will the arithmetic mean of these readings be a good 
estimate of the “true” weight of the object? 

Five judges, using a 10-point scale, rated an individual on initiative as 
follows: 9, 7, 7, 6, 4. Under what conditions would the arithmetic mean 
of the ratings be a good estimate of the "true" initiative of the 
individual? 

A midwestern city reported a weather mean temperature of 68° for 
1952, which was exactly the same as that reported by a Pacific Coast 
city. What important information is concealed? 


. The mean of the ages of 30 children іп a sixth-grade class was found to 


be 11 yrs. 2 months. What significant facts about the group are con- 
cealed if only the mean is reported? 


. A college student completed 120 hrs. of undergraduate work with hours 


and grades distributed as shown below. Suggest several ways of ar- 
riving at a numerical average grade for the student. In what way is 


any average misleading? 


FIELD HRS. GRADE FIELD HRS. GRADE 
Economics 
Philosophy 
Fine Arts 
Physical Ed. 
Music 


goou*» 
ње с с 
osoaa 


The mean as defined by formula (3.2) is М = ZX/N. Canyou find 
УХ if you know М and №? Utilizing this relationship show that formula 
(3.6) gives the mean of two series combined. Does the relationship 
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make’ it possible to find the mean of a distribution made up of more 
than two series? 

24. A halfback carried the ball 10 times with a mean gain of 7 yds. in one 
game; 15 times with a mean loss of 2 yds. in a second; 6 times with a 
mean gain of 0 yds. in a third; and 10 times with a mean gain of 12 
yds. in a fourth. What was his mean gain for the four games? 

25. If the 10 samples of Table 3.6 were combined, what would be the mean 
of the resulting distribution of 500 scores? 

26. If the N scores of a series are represented by Xi, Xs . . . , Xy and 
the mean by M, the deviations of the scores from the mean are X, — M, 
X; — M, ..., Xy — M. Show that the mean of these deviations is 
zero. If necessary, study the matter by examining several short series, 
such as 1, 3, 5, 5, 6. 

27. If a constant is subtracted from each score in a series, what is the 
effect on the mean? Utilize this information in finding the mean of the 
Scores 91, 94, 98, 98, 99. 


The Geometric Mean 


Although the arithmetic mean, the median, and the mode are by 
far the most widely used measures of the central tendency of sta- 
tistical series, there are two classes of problems in which none of 
the three is appropriate. Аз an instance of the problems in one class, 
consider the following. 

The total enrollment in private and public secondary schools in 
1900 was 696,000 and in 1920, 2,496,000. What is the best estimate 
of the enrollment in 1910 that can be made from these figures? If 
we take the arithmetic mean of the two figures, we obtain to three- 
figure accuracy 1,600,000. Such an estimate would be based upon 
the assumption that the enrollments had increased by constant 
amounts for each period. In nearly all kinds of population problems, 
however, the increase tends to be compounded, i.e., populations 
increase in each period by a percentage of their size at the beginning 
of the period. This is similar to what takes place in compound 
interest. In such situations the arithmetic mean tends to be less 
satisfactory than an average based upon the product of the values 
in the series. 

In the illustrative problem, if the proportional increase in each 
ten-year period is assumed to have been the same, the ratio of the 
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enrollment in 1910 to that in 1900 is equal to the ratio of the enroll- 
ment in 1920 to that in 1910, i.e., 


Е оло A’ 2,496,000 
696,000 Кэш 


so that Ey910 = %/ 696,000 X 2,496,000. Solving for E1910 we obtain 
1,320,000 as the estimate of the enrollment in 1910, The ratio of 
successive enrollment figures is 1,320,000/696,000 (or 2,496,000/ 
1,320,000) or about 1.9. Using this ratio we would estimate an enroll- 
ment in 1930 of 2,496,000 X 1.9 or 4,740,000. 'The enrollments 
reported by the United States Office of Education were 1,111,000 
in 1910 and 4,800,000 in 1920. Had we used the arithmetic mean 
and the arithmetic differences between enrollments in 1900, 1910, 
and 1920 to estimate the 1930 enrollment, we would have obtained 
about 3,400,000. Thus, the estimates we obtain by the ‘‘product- 
root" method are much superior in this case to those provided by 
the arithmetic mean. The number 1,320,000 is the geometric mean 
of the two numbers 696,000 and 2,496,000. 

In general, the geometric mean (GM) of a set of N values is defined 
as the Nth root of the product of the N values. Symbolically, 


GM = A/QGXX(X) <>. (А). (3.7) 


Since it is derived from the product and number of items in а 
series, the geometric mean will be the same for two or more series 
if the numbers of the items and their products are the same, what- 
ever the respective values of the items. This property of the geo- 
metric mean makes it extremely useful in dealing with series in which 
successive values tend to be related through a constant ratio. 

Geometric Mean as an Average of Ratios. To demonstrate 
the usefulness of the geometric mean as an average of ratios, let us 
consider the problem of estimating the numbers of secondary school 
graduates in 1937 and 1941, given the facts that the number of 
graduates in 1930 was 667,000 and in 1940, 1,228,000. If we have 
no other information, the best assumption we can make is that 
there has been a constant rate of increase for each year, and that 
consequently the ratio r of the number of graduates for a given 
year to the number for the preceding year is constant. By this 
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assumption the number of graduates for any year is the product of 
r and the number for the preceding year, and we may write: 


NUMBER OF 
YEAR GRADUATES 
1930 667,000 
1931 667,000r 
1932 667,000r? 
1933 667,000r* 
1934 667,000r4 
1935 667,00075 
1936 667,00075 
1937 667,000r7 
1938 667,00075 
1939 667,000r° 
1940 667,000719 


Since it is given that there were 1,228,000 graduates in 1940, we have 


o _ 1,228,000 
667,000 


Utilizing logarithms, we find that r is equal to 1.06. The root 1.06 is 
the geometric mean of the ratios, and the average annual rate of 
increase is 6 per cent. If we actually knew each of the 10 yearly 
ratios and determined their geometric mean, we would obtain 1.06. 
We may now estimate the number of graduates for 1937 as 667,000 
1.067, or 1,020,000, and the number for 1941 as 1,228,000 X 1.06, 
or 1,300,000. These are the best estimates we can make under the 
circumstances. 

The geometric mean is useful in connection with studies of indi- 
vidual growth and development, as well as with studies of popu- 
lation change. (See exr. 31.) In any situation in which we wish to 
interpolate or extrapolate, i.e., estimate either intermediate or 
projected values, in a statistical series in which the values tend to 
differ by a constant ratio, the geometric mean is useful. 

The Geometric Mean in Positively Skewed Distributions. 
Before considering the second class of problems in which the geo- 
metric mean may be more useful than the mean, median, or mode, 
we need a convenient method of computing the geometric mean. 


= 1.84, 
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И we take logarithms in formula 3.7, we will have 


log Ха + log X; + log Хз + <<: + log Xv 
N 


log GM (3.8) 


Hence, to find the geometric mean of a series of N values, we need 
only ада {ће logarithms of the respective values, divide by №, and 
find the antilogarithm of the quotient. The procedure is illustrated 
in Table 3.7, in which the geometric mean of instructional cost per 
pupil for a school district is determined. 


TABLE 3.7 
COMPUTATION OF GEOMETRIC MEAN 


PER PUPIL COST 


YEAR X тов X 
1942 $45.02 1.6534 
1944 50.20 1.7007 
1946 60.87 1.7844 
1948 15.44 1.8776 
1950 95.36 1.9794 
SUM $326.89 8.9955 
log GM = 89955 = 1.7991 
GM = $62.97 


If we assume that the items of a grouped series have the mid- 
values of their respective classes, the extension of formula (3.8) to 
the frequency distribution is easy. We need only to multiply the 
logarithms of the mid-points of the class intervals by the respective 
class frequencies before summation and division, as illustrated in 
Table 3.8. 

When a distribution is severely skewed to the right, the geometric 
mean may be fairer and more reliable than other measures of central 
tendency. Consider the data of Table 3.8. These data were obtained 
by observing the number of seconds each of two subjects spent on 
each of the Minnesota Paper Form Board items which he worked 
correctly. 
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TABLE 3.8 
COMPUTATION OF GEOMETRIC MEANS OF RESPONSE 
TIMES OF TWO SUBJECTS 
(Data from ref. 4, p. 224) 
eS In Par t ut E ent 
(NUMBER OF 

MID-POINT RESPONSES) X 
TIME IN NUMBER OF RESPONSES OF LOG OF (LOG OF MID-POINT) 
SECONDS SUBJECT 1 SUBJECT 2 INTERVAL MID-POINT SUBJECT | SUBJECT 2 


0-4 3 1 2 0.3010 0.9030 0.3010 
5-9 21 19 7 0.8451 17.7471 16.0569 
10-14 9 1 12 1.0792 9.7128 11.8712 
15-19 10 4 17 1.2304 12.3040 4.9216 
20-24 5 2 22 1.3424 6.7120 2.6948 
25-29 4 3 27 1.4314 5.7256 4.2942 
30-34 1 1 32 1.5051 1.5051 1.5051 
35-39 1 31 1.5682 1.5682 
40-44 1 42 1.6232 1.6232 
45-49 
50-54 L 2 1.7160 1.7160 
55-59 1 f 1.7559 1.7559 
60-64 1 62 1.7924 7924 
SUM 55 44 46.5422 


Using the sums at the foot of Table 3.8, we obtain for the first 
distribution 


log GM = 55519 — 1.0574, 
GM = 11.4 sec. 
and for the second distribution 
log GM = Р 1.0578, 


GM = 11.4 sec. 


The student can verify that the arithmetic mean and median for 
the first distribution are 14.5 and 11.4 seconds, respectively, and 
for the second, 14.2 and 10.4 seconds. 

The geometric mean of a positively skewed distribution will 
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always be closer to the median than is the arithmetic mean. In such 
distributions, the geometric mean is a somewhat fairer measure of 
central tendency than the arithmetic mean and tends to be a some- 
what more reliable measure than the median. In the above illus- 
tration, the fairest and safest conclusion would be that the two 
subjects are equal in average" speed of response, as elicited by 
the given tasks, the average being 11.4 sec. 

It will have been noted that the geometric mean of a set of scores 
is the antilogarithm of the arithmetic mean of the logarithms of the 
scores. In situations where the distribution of the logarithms of a 
set of scores is more nearly normal than the distribution of the 
original scores, the geometric mean is an appropriate measure of 
central tendency. The student will find several examples of the 
use of logarithms in “normalizing” positively skewed distributions 
in Ref. 4. 

The geometric mean is not applicable to series containing 0 or 
negative values. It is 0 when any value in the series is 0 and has 
little meaning when one or more of the values is negative. 


Exercises 


28. The average salaries of public school teachers in the United States for 
school years ending in 1900, 1910, and so on were: 1900, $325; 1910, 
$485; 1920, $903; 1930, $1,516; 1940, $1,564; 1950, $3,283. Compute 
the geometric mean. Which salary is most out of proportion? Find the 
ratio of the series and estimate the average salary for 1960. 

29, Estimate the number of high school graduates in 1932 from the data, 
p. 104, What assumption is made in estimating this number? 

30. Estimate the number of high school graduates in 1942 from the data, 
p. 104, What assumption is made in estimating future values in a 
geometric series? 

31. The development of children in some abilities, such as vocabulary, 
tends to be compounded rather than additive from, say, half year 
to half year during years of rapid development. Suppose that you 
haye given a test measuring such an ability to groups of five-year-olds 
and seven-year-olds and that the mean scores are 2.0 and 32.0, respec- 
tively. How would you estimate the norms or expected mean scores of 
groups of ages 5 yrs. 6 months; 6 yrs. 0 months; 6 yrs. 6 months? Com- 
pare these norms with those you would estimate if you assumed that 
the ability developed in additive fashion. 
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The Harmonic Mean 


The last measure of central tendency we shall discuss is the har- 
monic mean, HM. It is defined as the reciprocal of the mean of the 
reciprocals of the values in a series. (The reciprocal of a number is 
1 divided by the number, e.g., the reciprocal of 4 is 1/4.) In symbols, 


1 N 


НМ = Уху = xü/Xy 


(3.9) 


The harmonic mean of the series 6, 9, 12, 12, and 15 is 


5 900 


ШІ» 1/6 + 1/9 + 1/12 + 1/12 + 1/15 92 


= 9.8, 


The harmonic mean is principally useful in connection with prob- 
lems involving averages of rates of work. It is well known that 
rates can be expressed in either of two forms, (1) time required 
per unit amount accomplished, and (2) amount accomplished per 
unit time. In illustration, a pupil who completes 15 test items in 
10 minutes may be said to be working at the rate of 2/3 minute per 
item or at the rate of 114 items per minute. 

As an example of the use of the harmonic mean, suppose that one 
wishes to find the average rate of reading, in terms of words per 
minute, of 5 pupils who require 6, 9, 12, 12, and 15 minutes, respec- 
tively, to read a passage containing 1,800 words. The harmonic mean 
of the series is 900/92 minutes per 1,800 words, or, expressed in 
the desired form, 184 words per minute. The student can verify 
that the same mean would be obtained if the original series were ex- 
pressed in terms of words per minute and the arithmetic mean taken. 

In general, when rates are expressed in terms of time per unit 
amount, the harmonic mean of the rates is equivalent to the arith- 
metic mean of the same rates expressed in terms of amount per unit 
time, and vice versa. The same can be said about prices, since 
prices may be stated in terms of money per unit material or in terms 
of material per unit money, e.g., the price of pencils may be stated 
аз so many cents for a pencil or so many pencils for a dollar. 

The harmonic mean provides no information which cannot be 
obtained by the arithmetic mean, and, for this reason, the harmonic 
mean is a luxury measure of central tendency. It is sometimes con- 


ee 
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venient to use when one has a long series of rates or prices expressed 
in one form and wishes to know what the arithmetic mean would 
be if the series were expressed in the other form. As a rule, however, 
it is advisable to express the original rates or prices in the form 
desired so that their frequency distribution may be inspected before 
any average is taken. When this is done, the arithmetic mean can 
of course be obtained directly. 

The harmonic mean, like the geometric mean, is meaningless in 
any series containing zero or negative values. 


Exercises 


32. Shop student А can do a unit of work in 12 min., B in 15 min., C in 
20 min. What is their average rate of working? At this rate how many 
units will they turn out in 4 hrs? Show that the answer may be ob- 
tained by using either the harmonic or the arithmetic mean. 

33. The 5 members of one group of workers require the following numbers 
of min. to complete 10 tasks: 12, 30, 15, 20, 15. The 5 members of a 
second group require 10, 30, 12, 10, 30 min. to complete the 10 tasks. 
Find the arithmetic and harmonic means of both series. In what sense 
does the harmonic mean provide a fairer comparison of the two groups 
than the arithmetic mean? Express the rates in reciprocal form and 
compare their arithmetic means with the harmonic means of the rates 
as given. 

34. A test publisher advertises tests as follows: reading tests, 36 per $1.00; 
arithmetic tests, 18 per $1.00; English tests, 24 per $1.00. Compare 
the harmonic mean of the prices as stated with the arithmetic mean of 
the prices per test. 

35. The number of minutes required by each of 6 typists to type 500 words 
correctly were 10, 12, 12, 15, 15, and 20. If the 6 typeat their respective 
rates for an hour, how many words will they type? 


Interpretation and Use of Measures of Central Tendency 


When the student first encounters measures of central tendency, 
he may become so engrossed in their calculation that he loses sight 
of their meaning. In the most general sense, the calculation of a 
measure of central tendency is a process of reducing a statistical 
series to a single, summarizing figure. The process is necessary in 
comparing and describing series for the simple reason that the mind 
cannot grasp the meaning of a series in all of its details. 
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The reduction of a series to an average value is not without 
danger of distorting information. Variability is an important feature 
of a statistical series. An average value conceals this feature, and a 
comparison of average values may be unfair and misleading if the 
series are dissimilar in variability. An average does not have mean- 
ing independent of the other characteristics of a statistical series; 
in fact, if a series is highly variable or irregular and rich in detail, 
an average may have no real meaning and serve no useful purpose 
at all. i 

Appropriate Uses of Averages. The question of which average 
to use in summarizing a given series is an important question, but 
one which permits no thumb-rule answers. А question which ante- 
cedes “which average" is whether any average will facilitate useful 
analysis and comparison. 

Assuming that a given series is amenable to reduction to an 
average value of some sort, the selection of a particular average 
involves the considerations which have been dealt with in previous 
sections of this chapter. In summary form these are: 


The arithmetic mean is the most widely used and useful measure of 
central tendency. It is the most reliable measure, as a rule, and is simply 
and clearly defined. It perhaps best expresses the idea of an average value. 
Being an algebraic quantity, the mean is tractable in mathematical 
analysis. The most precise measures of variability and relationship, to 
be described later, involve the mean. It is generally advisable to use the 
mean as the measure of central tendency unless there is special reason 
for not using it. 

The median is particularly useful in four situations. First, if a series 
contains a few extreme or exceptional values, the median generally 
gives a fairer impression of the average value of the series than the mean, 
It is usually the case that, when the median of a distribution is markedly 
different from the arithmetic mean, the former is the better average. 
Second, if there is doubt regarding the nature of the unit of measure- 
ment, the summation of a set of scores may be unsound. In this situation, 
the median as a point below and above which one-half of the scores lies 
is perhaps the most accurate statement of central tendency which can 
be made. Third, if a distribution has an upper or lower class interval of 
unspecified length, the median is the most reliable measure which can be 
obtained. Fourth, the median is a member of the percentile system, and 
hence is an appropriate average when a distribution is described and 
interpreted in terms of percentiles. (See pp. 125-131.) 
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The mode is appropriate when a quick approximation to the point of 
concentration or “piling up” of the items іп a series is desired. It is the 
only average available, if information regarding the value of greatest 
frequency or the most typical case is needed. Except for these rather 
unusual cases, the mode has little utility as an average in applied sta- 
tistics. It is an unreliable and nonalgebraic measure. The concept of 
mode is primarily useful in analyzing and interpreting series having 
two or more points of concentration. 

The geometric and harmonic means are sometimes called minor means. 
They are not widely used in educational work; the latter, in particular, 
is of very limited usefulness. When faced with the problem of determining 
an average of rates or prices involving a change of form, it is possible 
{о change form first and then to use the arithmetic mean, instead of 
finding the harmonic mean of the rates or prices in their original form, 
although the latter procedure may be more convenient. 

The geometric mean is useful in dealing with change or growth data, 
when the values suggest a geometric series. It may also be useful in 
analyzing a series showing marked positive skewness. The geometric 
mean, perhaps, deserves more attention than it has received in studies 
of population change and individual development. 


No unqualified answer сап be given to the question of which 
average, if any, should be used for a given series. The user can only 
be expected to have reasons for his choice of a particular average, 
reasons which are supported by the nature of the given data, the 
properties of the average, and the issue upon which the data are 
expected to throw light. The best advice that can be given to the 
student at this point is to become thoroughly acquainted with the 
limitations and advantages of each average. 

Central Tendency of the Qualitative Series. In the qualita- 
tive series, as was noted in Chapter I, frequencies in categories 
constitute the information of statistical interest. Obviously an 
average of frequencies, in the usual sense, would have little meaning. 
Suppose the religious preferences of a group of 525 individuals are: 


PREFERENCE NUMBER 
Catholic ` 150 
Methodist 100 
Baptist 75 
Episcopalian 50 
Presbyterian 50 


Other 100 
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Neither a median nor a mean value of the data as classified would 
һауе any useful meaning. The modal preference might be reported 
as Catholic (or, if the preferences were classified as Catholic and 
Protestant, as Protestant), but the information of concern is the 
breakdown by categories. As a rule the concept of averages is not 
appropriate in dealing with qualitative series. 

Occasionally, however, it is useful to think of relative frequencies 
or proportions as arithmetic means. Suppose 60 in a group of 100 
students pass a test item and 40 fail it. The proportion passing is 
60/100. If the passes are scored “1” and the failures “0,” 60/100 is 
the arithmetic mean of the 100 scores on the item. (We shall return 
to this matter in connection with test item analysis.) In the same 
sense, a baseball batting average may be thought of as an arithmetic 
mean, 

When we think about proportions as arithmetic means, we can 
readily see why two or more proportions or percentages cannot be 
combined without reference to their base numbers. Given the pro- 
portions and their respective base numbers, pi, Ni, and р», №, the 
combined proportion р; will be 


2 pii Е PN: 
P= NEN 


which is exactly analogous to formula (3.6). (See exr. 43.) 

‘Accuracy of Averages. Thus far nothing has been said about the 
limitations imposed by the approximate nature of data upon the 
accuracy of the measures of central tendency, a matter of constant 
concern in computation. 

In finding the mean of a set of scores we find the sum of the 
scores ХХ and divide by the number N. Since N is an exact 
number, the mean will have as many significant digits as the sum 
of the scores. If, for example, 2X = 4,824 and N = 84, the mean, 
57.42857 . . . , has four-figure accuracy and would be reported as 
51.43. 

In the coded method of finding the mean, since EX is never 
explicitly determined, it is necessary to estimate ХХ before the 
accuracy of the mean can be determined. This can be done by round- 
ing the mean computed by the coded method to the same precision 
as the least precise score in the series and multiplying the rounded 


(3.10) 


~~ ай ива т нй 
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mean by NV. The product thus obtained, an estimate of =X, provides 
the information needed to judge the accuracy of the mean. To 
illustrate, the mean of the distribution in Table 3.5 is found to be 
26.737, which, rounded to the precision of the original measures, 
becomes 27. The product of 27 and 38 (JV) is 1,026. Since there are 
four significant figures in 1,026, the mean is rounded to 26.74. Tt 
should be noted that this procedure does not take into account 
errors of grouping, and will therefore usually exaggerate to some 
extent the accuracy of the mean as computed from grouped data. 

The mode and median, being based upon counting, are some- 
times said to be exact. Since both, however, are usually affected by 
a change in the grouping scheme, in a practical sense they cannot 
be considered to be exact. As a rule, the mode should not be reported 
to greater precision than the precision of the original scores. The 
median, as a rule, should be reported to no greater accuracy than 
the mean. The accuracy of the mean, when desired as a criterion, 
can be quickly determined. (See hint, exr. 37.) 

The harmonic mean is based upon the sum of the reciprocals of 
the scores in a series. Although the reciprocal of a score has as many 
significant figures as the score (why?), the sum of the reciprocals 
will not necessarily have the same number of significant figures as 
the sum of the original scores. For the sake of convenience and 
uniformity, however, it is suggested that the harmonic mean be 
reported to the same number of digits as the arithmetic mean. 

Since the geometric mean is based upon the product of the scores 
in a series, it will have no more accuracy than the least accurate 
score. Hence, the geometric mean should be reported to as many 
digits as are contained in the score having the fewest significant 
digits. 

We shall return to the question of accuracy in Chapter VIII and 
shall deal with it there in terms of sampling fluctuations, which, 
under some conditions, impose further restrictions upon the number 
of places to be reported in a measure of central tendency. 


Exercises 


36. Describe a set of data, not mentioned in the text, for which (a) the 
mode would be appropriate; (b) the mean; (c) the median; (d) the 
geometrie mean; (e) the harmonic mean. Under what conditions might 
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it be desirable to use more than one average in summarizing a given 
set of data? 

37. Examine the distributions of Tables 2.3 and 2.6 and Fig. 2.12, 2.13, 
2.14, and 2.15. How many digits would you retain in the mean or 
median of each distribution? (Hint: The number of digits to be retained 
in the mean of a frequency distribution can be quickly determined. 
In Fig. 2.12, for example, the mean by inspection is roughly 120. 
If we multiply 120 by 47, the N of the distribution, we obtain 5,640 
as a quick approximation to the sum of the series. Although the ap- 
proximation may be poor, it assures us that there are four significant 
figures in the sum of the series. Hence, the mean or median of the dis- 
tribution of Fig. 2.12 is reported to four-figure accuracy.) 

38. What is wrong in the following: (a) The geometric mean of 1.06, 1.2, 
1.11, and 1.08 was reported as 1.112; (b) a teacher reported the mean 
ТО of his class of 32 pupils аз 108.625. 

39. Referring to Table 3.8, how do the mode, median, and mean of the 
distribution of student 2 compare as to size? 

40. Indicate in each of the frequency polygons below the relative position 
of the mode, median, and mean. 


Score Score Score 


41, In an experiment to determine whether one method of teaching plane 
geometry was superior to a second method in increasing skill in critical 
thinking, two groups of geometry students were equated. One group 
was taught by the first method, the other by the second. The gains of 
the individual students under the respective methods, as indicated by 
the differences between scores on pre- and post-experiment tests of 
critical thinking, are shown in the distributions below. The experi- 
menter compared mean gains of the two groups and concluded that 
the first method was superior. Do you agree? Explain. What further 
study is suggested by the distribution of gains under the second 
method? 


Characteristics of Statistical Series. Central Tendency 115 


42 


44. 


N 


GAINS FIRST METHOD SECOND METHOD 
+25-+29 2 1 
20---24 3 г 
+15-+19 3 6 
+10-+14 6 2 
+5-+9 8 1 
0-44 5 2 
== 2 1 
—10-—6 1 1 
—15-—11 0 E 
30 32 


. Criticize each of the following uses of averages: 


a. The mean age of men and women at time of marriage in a certain 
city, based on records for a two-year period, was 26.4 yrs. 

b. In an endowment drive put on by a small college, 1,252 donations 
were received ranging from $1.00 to $250,000 and totaling $355,000. 
The college announced an average donation of about $284. 

с. А teacher's algebra class had a mean of 76 and a median of 68 on a 
standardized achievement test. The teacher concluded that 76 was 
the average grade. 


. The average daily attendance for the school year in a tenth-grade class 


of 320 was 95 per cent; in the eleventh-grade class of 300, 92 per cent; 
and in the twelfth-grade class of 240, 90 per cent. What was the per 
cent of attendance for the three classes combined? 

How does the geometric mean compare in size with the arithmetic 
mean of a given series? With the harmonie mean? Study the matter 
by examining several simple series. (Note: A general proof that НМ < 
GM « AM is given in Ref. 2, pp. 830-831.) 
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Chapter IV 


Characteristics of Statistical Series. 


Variability and Related Features 


THE CHAPTER heading, “ Variability and Related Features," is 
not entirely appropriate, for all features of a statistical series are 
related to variation. The frequency distribution owes its distinctive 
properties to the extent and manner of variation of the items it 
comprises; it is the variation of statistical data which gives meaning 
and usefulness to the concept of average value. The whole of 
statistics might well be characterized as the study of variability. 

In the previous chapter, we have seen that averages, such as the 
mean, although necessary in describing and comparing series, 
conceal information about variability. 

The distributions of intelligence quotients for 4 eighth-grade 
classes are shown in Table 4.1. The distributions are observably 
different in the way the items vary about their respective means. 
More than a quarter of the 1Q’s in Schools C and G lie outside the 
range of 10/7 in School B; nearly a quarter of the IQ's in Schools 
В, С, and G are above the highest IQ in School J. Important in- 
formation would be concealed if only the mean IQ's in the classes 
were reported. 

For further illustration, consider again the data of Table 3.1, 
р. 76. The majority of the distributions in the table are character- 
ized by differences in amount and manner of variation of the 
scores as well as in central tendencies. À comparison of arithmetic 
achievement in the schools on the basis of average values alone, 
although possibly sufficient for some purposes, is not as complete 
and conclusive as the data permit. 

It is generally the case that we need to take variability into 
116 
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account in describing and comparing statistical data. The variability 
of a series may be more important and reveal more about the series 
than an average value. Besides, averages are always more meaning- 
ful and less susceptible to misinterpretation when accompanied by 
statements regarding variability. 


TABLE 4.1 
DISTRIBUTIONS OF INTELLIGENCE QUOTIENTS IN 4 
EIGHTH-CRADE CLASSES 
(Data from table I, appendix B) 


INTELLIGENCE 
QUOTIENTS scuoon В scHooL С SCHOOL G SCHOOL J 


130-134 
125-129 
120-124 
115-119 
110-114 
105-109 
100-104 
95-99 
90-94 
85-89 
80-84 
75-79 
70-74 
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The purpose of this chapter is to examine various methods of 
measuring variability and the interpretation and use of the meas- 
ures. The amount or extent of variability of a statistical series is 
described quantitatively by various measures, the most common 
of which are range, quartile deviation, mean or average deviation, 
and standard deviation. These and their uses will be considered at 
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some length. Finally, measures of skewness and kurtosis, which 
relate to irregularities in variation, will be discussed. 

The computation of some of the measures used in describing and 
interpreting variability is somewhat laborious, but the meaning 
of the measures is not difficult to grasp. They are merely quantities 
by which we analyze and describe the amount and peculiarities of 
the variation characterizing statistical data. 


The Range 


The simplest way of describing the variability of the values in a 
series is to state the difference between the highest and lowest 
values. Such a difference is known as the range. The ranges of 10/8 
in Schools В, С, G, and J, as listed in Table I, Appendix B, are: 


School B, 114 — 81 = 33 
School С, 131 — 74 = 57 
School G, 125 — 54 = 71 
School J, 104 — 64 = 40 


When series are grouped, as are the IQ's in Table 4.1, the indi- 
vidual items lose their exact values, and there is no way of deter- 
mining the actual range of the series. For grouped series, either the 
difference between the mid-points of the highest and lowest class 
intervals, or the difference between the higher expressed limit in 
the top class and the lower expressed limit in the bottom class, may 
be taken as the range. Ordinarily the range is used with reference 
to ungrouped series. 

When we examine the distributions of Table 4.1, we note that 
the ranges, although roughly indicative of dispersion or variability, 
fail to give any information about the variation of the IQ's between 
the extremes. The range of the great majority of IQ's in School G 
js from about 70 to 114; in School J, from about 80 to 104. 

In general, the range is not a representative measure of the varia- 
bility of a series. Since it is based upon the values of the two 
extreme items, it tells nothing about the variation of the inter- 
mediate items and is highly sensitive to sampling fluctuations. 

The range is easily determined and easily understood. In a large, 
unimodal sample, it possesses some reliability as an estimate of 
variability in the population. It is chiefly useful, however, as a 


Variability of Statistical Series 119 


supplementary measure. In most situations a statement regarding 
range, in addition to a more representative and trustworthy measure 
of variability, adds to the description of the data. 


Exercises 


1. Determine the ranges of the distributions in Table 3.1 by finding the 
difference between the mid-points of the highest and lowest class inter- 
vals. What is the chief limitation of the range as а measure of variability? 

2. A reliable vocabulary test is given to a high school freshman class con- 
sisting of 190 girls and 178 boys. The girls’ scores range from 24 to 62; 
the boys’ from 17 to 68. Is there any evidence that the boys are more 
yariable than the girls? In what respect is the evidence inconclusive? 


The Quartile Deviation 
/ . t . 
/ As we have seen, the range is the difference between the highest 


and lowest values in a series. Described in another way, it is the 
interval or distance on the scale which includes 100 per cent or all 
of the items in a given series. The chief limitations of the range are 
due to its dependence upon the two most extreme values. 

There are several methods of deriving measures of variability 
which are independent of extreme values) Before taking up the 
most common of these, which is based upon the interval containing 
the middle 50 per cent of the items in a given series, we need to 
consider briefly the meaning of quartiles. 

The first quartile Q; is defined as the point on the scale of scores 
below which 25 per cent or a quarter of the scores lie; the second 
quartile 0» as the point below which 50 per cent lie; and the third 
quartile Оз аз the point below which 15 per cent lie. (Since the point 
below which 50 per cent of the scores lie is also designated the 
median, Q, and the median аге obviously one and the same thing.) 
Thus, 01, О» or the median, and Qs divide a scale of scores into four 
intervals, each of which includes 25 per cent or one-quarter of the 
scores, and the interval or distance between Qı and Q; includes 
the middle 50 per cent. 

The histogram of a distribution of IQ's is shown in Fig. 4.1. To 
determine Qı, 02, and Оз we must find the points on the scale below 
which 1/4, 1/2, and 3/4, respectively, of the IQ's lie. Since 1/4 of 
38 is 9.5, Q, will be the point below which 9.5 IQ's lie. This point 


120 Statistics in Education 


is easily located by a procedure similar to that employed in finding 
the median. We need only to interpolate in the class interval in 
which the 9.5 case falls. Since there are 9 cases below the 84.5-89.5 
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Fig. 4.1. The quartiles in the histogram, (School C, Table 4.1.) 


class, 11 cases in that class, and since the interval is 5, we have 


М Q, » 845+ (2% = ») 5 = 84.13, 


li 


Similarly, noting that 1/2 and 3/4 of 38 are 19 and 28.5, respectively, 
we һауе for Q: and Q 


У Qi = Ма = 845 + (%- °) 5 = 89.05, 
/ Qs =995+ (=) 5 = 102.00. 


The values of О’, 0, and 0; are shown in the illustration. It will 
be noted that 0, is closer than Оз to the median. If the distribution 


"-—————"—— — Ч 
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were symmetrical, they would be at equal distances from the 
median. (Why?) The difference between 0; and Q, is the interval 
or range which includes the middle 50 per cent of the 1075 in the 
distribution. 

In any distribution, the difference or range between Оз and 01 
is designated the interquartile range. When the interquartile range 
is divided by 2, the quotient is known as the semi-inlerquarlile 
range or the quartile devialion and is commonly abbreviated to 0. 
Thus, 


0- ш (4.1) 
For the IQ data above, the quartile measure of variability is 
Ue ous 102.00 = 84.73 _ 8.64. 


The quartile deviation is sometimes spoken of as the average 
deviation of the first and third quartiles from the median, In the 
example, the deviations of Qı and 0; from the median are 4.92 
and 12.95. Averaging these deviations we obtain 8.64, as before. 

Computation of the Quartile Deviation. Little more needs 
to be said about the computation of Q for a given distribution, After 
1/4 and 3/4 of the cases in the distribution have been determined, 
Qı and Оз can be readily computed by а process of interpolation 
similar to that used in finding the median. The quartile deviation 
is simply half of the difference between 1 Оз and 01. We shall illustrate 
the procedure again by computing ig О for the distribution of IQ's in 
School J, Table 4.1. 


10 f of 

100-104 5 29 

Interval containing Q;— 95-99 4 24 
90-94 6 20 

Interval containing Q,— 85-89 8 14 
80-84 2 6 

15-19 1 4 

70-74 2 3 

65-69 0 1 

60-64 1 1 
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ed de ТІ» 
N=2, 4М=125 3М-сіл5, 
Q; = 845 + = s) 5 = 85.28, 
De = 20) 5: = 96.60; 
Q = 9669 — 85.28 _„ ro, 


The first and third quartiles, Q, and 0;, are also designated the 
twenty-fifth and seventy-fifth percentiles. The percentile system 
will be taken up in the next section and a generalized procedure for 
finding any percentile will be described. 

There are several ways of determining the quartile deviation for 

“ungrouped data, although none is entirely satisfactory because Q, 
and Оз can be defined exactly only in the frequency distribution. 
The simplest way is that of taking one-half of the difference between 
the score at or just above the N/4 position and the score at or just 
above the 3/V/4 position, counting from low to high after the data 
have been arranged in order of size. It is possible to get more exact 
results by interpolating on the scale of scores but rarely worth the 
effort. The quartile measures computed from a series too short to 
warrant grouping are ordinarily of very limited usefulness. 

Uses and Limitations of the Quartile Deviation. Since it is 
independent of the values of the extreme items in a series, the quar- я 
tile deviation is а more representative and trustworthy measure е of 
variability than the over-all range. == | 
_ Ша series is symmetrical, Q, and 0; are equidistant from the 
median. Consequently, if we lay off a Q distance in both directions 
from the median in a symmetrical distribution, we will include 50 
per cent of the items. When a series is skewed, as is usually the 
case, + 0 from the median include only approximately 50 per cent 
of the items; however, as we shall see below, the approximation 
tends to be good in even quite skewed series. 

The quartile deviations of the C and ./ distributions in Table 4.1 
have been found to be 8.64 and 5.70, respectively. The student 
can verify that they are 9.28 for the B distribution and 11.88 for 
the G distribution. Bringing together the medians and quartile 
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deviations of the four distributions, we have 


SCHOOL MDN Q 
B 91.17 9.28 
с 89.05 8.64 
G 92.00 11.88 
J 89.92 5.70 


Now suppose we are given only the medians and the Q's. What can 
we reasonably conclude? In the first place, we can be pretty sure 
that the IQ's in School G are the most variable; those in B, second 
in variability; those in C, third; and those in J, least variable. In 
the second place, we can be pretty sure that approximately 50 per 
cent of the IQ's in each distribution fall in the interval from Mdn — 
Q to Mdn + 0, so that, roughly, the middle 50 per cent fall in the 
interval from about 82 to 100 in B, 80 to 98 in C, 80 to 104 in G, 
and, 84 to 96 in J. 

et us see how good these conclusions are. By methods to be de- 
scribed in the next section it is possible to determine exactly the 
percentage of items falling within a specified interval of a frequency 
distribution. Applying the methods to the distributions in Table 
АЛ, we obtain the following results, the intervals being set by 
taking Q distances below and above the medians. 


DISTRIBUTION INTERVAL PERCENTAGE OF 1078 IN INTERVAL 
B 81.89-100. 45 53.7 
с 80.41- 97. 69 51.0 
G 80.12-103.88 51.2 
J 84.22- 95.62 51.8 


Our estimates based upon the information given by the medians 
and the quartile deviations were rather good, surprisingly so when 
we note how widely the distributions in Table 4.1 differ. 

When we inspect the distributions in Table 4.1, we can see that 
all of our conclusions are reasonable and that the quartile deviation м 
is a more representative and useful measure of the variability of the 
10% than the over-all range. But the inspection also reveals a | 
serious limitation in the quartile deviation. Being based upon the р 
range of the middle 50 per cent of the items, it tells nothing about 
the range of the lower and the upper 25 per cent. It is possible for 


/ 
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two distributions to have equal Q’s, but quite dissimilar variability 
| in lower or upper quarters. Closely related to this weakness in the 
quartile deviation is its nonalgebraic property. Like the median, 
Q is not amenable to algebraic treatment. In a later section we shall 
consider an algebraic measure of variability, the standard deviation, 
which takes into account the variation of all of the items in a series. 
In summary, the quartile deviation is easy to compute and easy 
to interpret. It is one-half of the range of the middle 50 per cent of 
the items in a series. Moreover, when a Q distance is laid off above 
and below the median, the interval demarcated includes approxi- 
mately 50 per cent of the items, unless the series is highly irregular. 
The quartile deviation is applicable to most frequency distributions, 
including those having unequal class intervals and those having 
bottom or top class interyals of unspecified length. It pairs naturally 
with the median, and {И situations in which the median is preferred 
аз the measure of central tendency, Q is preferred as the measure 
of variability. 
In describing a series, it is a good plan to report Оз and -Qı as 
well as the median and Q. Given the four summary figures, the 
reader can picture a great deal about the series. 


Exercises 


3. The median IQ in School C, Table 4.1, is 89.05. The difference between 
the median and Qs is 12.95; the difference between the median and 
Qi is 4.32. What does this indicate regarding the symmetry of the 
distribution? 

4. Show by graphic illustration that it is possible for two distributions to 

Вауе equal quartile deviations, yet differ markedly from each other. 

. Is it possible that a score in a distribution could lie at a distance greater 

than 2Q from the median? Explain. 

6, A teacher’s analysis of reading scores in his class indicated that Qı was 
26 and that 0; was 46. He concluded (a) that the median was 36, and 
(b) that Q was 10. Do you agree with either conclusion? 

7. Compute the quartile deviation of one or more of the frequency distri- 
butions in Table 3.1, р. 76. What conclusions about the distributions 
can you draw from the Mdn, Оз, О’, and Q? 

8. A common method of constructing a scale for measuring attitude toward, 
say, Social security is to һауе a large number of judges rate statements 
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of opinion about social security from 1 to 11, according to degree of 
favorableness of the opinions. The statements are then assigned nu- 
merical values equal to the respective medians of the judges' ratings. 
Suppose that among the statements which have been rated by 50 
judges, the ratings of statements i and j were distributed as shown. 
below. If only one of the two statements is to be included in the scale, 
which should it be? Why? 


JUDGE'S OPINION 

RATING і 7 
7 3 
6 4 T 
5 13 11 
4 24 12 
3 6 9 
2 2 8 
1 1 


Percentiles and Percentile Ranks 


Closely related to the quartiles and the quartile deviation are the 
percentiles and interpercentile ranges. The Pth percentile is defined 
as the point below which P per cent of the items in a series lie and 
is commonly written Pp. It follows that Q; is the twenty-fifth per- 
centile, 0» or the median the fiftieth, and Q; the seventy-fifth; i.e., 
О, = Pos, Qs = Ры, and 0; = Р». Just as the twenty-fifth, fiftieth, 
and seventy-fifth percentiles constitute the quartiles, the tenth, 
twentieth . . . ninetieth percentiles constitute the deciles. 

The two most commonly used interpercentile measures of vari- 
ability are the range from the tenth to the ninetieth percentile, com- 
monly called the decile deviation D and the range from the seventh 
to the ninety-third percentile, the latter tending to fluctuate less 
in samples from normal populations than any other interpercentile 
range. Both measures are easily interpreted, the former as the range 
which includes the middle 80 per cent of the measures, and the latter 
as the range which includes 86 per cent of the measures. In order to 
compute these interpercentile ranges, we merely have to find the 
percentiles involved and subtract. 

Computation of Percentiles. We can find any percentile by a 
procedure similar to that used in finding Mdn, Оз, and Q;. Let us 
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generalize the procedure in the formula 


ENT) i, (4.2) 


р, -L«( 7 


in which Р» is the percentile desired, e.g., if the tenth percentile is 
desired, Pp is Py; L is the lower real limit of the class interval con- 
taining P»; PN is the number of cases to be counted off to reach 
Р»; Е is the total number of cases below the class containing P»; 
f is the number of cases in the class containing P»; i is the class 
interval. We shall illustrate the application of the formula in com- 
puting the percentiles needed in the interpercentile ranges men- 
tioned above. 

А distribution of 293 reading test scores is shown in Table 4.2, 
and the computation of percentiles is illustrated in the right-hang, 


TABLE 4.2 
COMPUTATION OF PERCENTILES IN A DISTRIBUTION OF 
293 READING SCORES 
(Data from table I, appendix B) 


SCORE 2j сом/ | COMPUTATION OF PERCENTILES BY FORMULA (4.2) 
57—59 1 293 Required to determine Ру, Pio, Poo, Pos: 
54-56 0 292 
51-53 0 292 7% of 293 = 20.51 
48-50 17 292 10% of 293 = 29.30 
45-47 26 275 90% of 293 = 263.70 
42-44 25 249 93% of 293 = 272.49 
39-41 33 224 
36-38 33 191 
33-35 44 158 р 20. 3s — 20 M 
mere СЕ а CAS e) odo 
27-29 29 19. Aa. (qm PE me =) 099 
24-26 14 50 Р = 20.5 + 3 = 22.24 
21-23 16 36 СЕ 263.70 — 249 Ж 
18-20 11 50 Py = 44.5 + (2020 = sty 3 = 46.20 
15-17 3 9 272.49 — 249 
12-14 4 6 Роз = 44.5 + шағам” 3 = 47.21 
9—1 1 2 
6-8 1 i 
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space in the table. Having obtained the percentiles as shown in the 
table, we can readily find the interpercentile ranges: 


Р» — Ро = 46.20 — 22.24 = 23.96, 
Ps — P; = 47.21 20.60 = 26.61. 


The first interpercentile range tells us that the middle 80 per cent 
of the reading scores covers a range of 23.96 score units, extending 
from 22.24 to 46.20; the second that the middle 86 per cent covers a 
range of 26.61 score units, extending from 20.60 to 47.21. 

Both Poo — Рі and Ps; — P; tend to be somewhat less affected 
by sampling fluctuations than Q, but neither is as widely used as Q. 
They have their chief uses, as has the over-all range, as supplemen- 
tary measures of variability. Obviously, the more percentiles and 
interpercentile ranges that are reported, the more completely a 
series is described. A highly irregular series, if reduced to summary 
figures at all, is more fairly described by several measures of vari- 
ability than by one. These should, of course, be limited to the neces- 
sary figures. The major service of statistics in description is to 
reduce a series to the minimum summary figures needed to give a 
fair picture. 

The percentile system, in addition to providing various range 
measures of variability, is extremely useful in educational testing. 
We shall digress from the main topic of this chapter to consider the 
system as used in the analysis and interpretation of test scores. 
The student will find that some writers use the term centiles instead 
of percentiles. Except for brevity, nothing is gained by the innova- 
tion, and we shall stick to the older usage. 

The Cumulative Percentage Curve. "The percentile system 
provides a convenient method of translating any score X in a dis- 
tribution to the percentage of cases having scores below X. Since 
the result is a sort of model distribution having an N of 100, a 
comparison of the performances of two or more groups of unequal 
sizes сап be made. Percentiles also make it possible to compare the 
performances of individuals in a single group on two or more tests, 
regardless of how the separate tests are scored. It is because of this 
latter fact that percentile scores are designated comparable scores. 

We may examine the percentile system graphically by constructing 
a cumulative percentage or percentile curve of the scores in Table «^ 
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4.2. The cumulative frequencies in the third column of the table 
must first be changed to cumulative percentages. The changes are 
made merely by ‘dividing the cumulative frequencies by 293 and 
multiplying by 100. Thus, the lowest percentage is 1/293 X 100 or 
84 per cent, the cumulative percentage corresponding to the 
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Fig. 4.2. Percentile curve of a distribution of 293 scores. (See Table 
4.2.) 


cumulative frequency 50 is 50/293 X 100 or 17.10 per cent and so 
on. 

The cumulative percentages corresponding to the cumulative fre- 
quencies are plotted in Fig. 4.2. The cumulative frequency scale is 
shown at the right as an aid to understanding the figure. In con- 
structing the percentile curve the cumulative frequency scale ordi- 
narily is not included, since it adds nothing useful to the graph. 

To construct a cumulative percentage curve for any distribution, 
we must change the cumulative frequencies to cumulative percent- 
ages, lay off a vertical percentile scale at the left of the horizontal 
scale of scores, and plot the cumulative percentages above the cor- 

jresponding real limits of the class intervals. The-latter is necessary 
w/ because cumulative percentage up to a class interval means the 
percentage of cases falling below the real limit of the class interval. 
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The work is quite similar to that in constructing the cumulative 
frequency curve (see p. 60), and is further illustrated in Fig. 4.3. 

The percentile curve tells a great deal about a frequency distribu- 
tion. When it is of smooth ogive shape, J, the distribution approxi- 
mates the symmetrical, bell-shaped form. When it is comparatively 
narrow and steep, the distribution has comparatively little vari- 
ability. Percentiles and interpercentile ranges can be readily ap- 
proximated from the curve. For example, we can quickly determine 
the approximate value of Ру for the distribution of reading scores 
by going up the percentile scale to 90, horizontally across to the 
curve, and vertically downward to the scale of scores. We strike the 
latter scale at about 46; thus, 46 is roughly the point below which 
90 per cent of the 293 scores fall. Similarly, we would determine 
P as about 22, so that Р — Pio is about 24, which is quite close 
to the result obtained earlier by arithmetic computation. 

We also can use the curve to approximate the percentile score 
corresponding to an original or raw score in the distribution. For 
example, if we want to know the percentile score of a raw reading 
score of 38, we go along the horizontal scale to 38, vertically upward 
to the curve, across to the percentile scale, and read 63. 'Thus, a 
score of 38 in the present distribution has a percentile value of 


Percentile scale РЫ 
о55 855558585 3 


2.5 8.5 145 20.5 26.5 32.5 38.5 44.5 50.5 
55 1.5 17.5 23.5 29.5 35.5 41.5 47.5 53.5 


Score 


Fig. 4.3. Percentile curves of four distributions, (From Table 3.1, p. 16.) 
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about 63, i.e., it exceeds about 63 per cent of the 293 reading scores. 
The student is asked to make other interpretations in exrs. 9 and 10. 

Comparison of Distributions by Percentile Curves. When we 
need to compare graphically two or more distributions of unequal 
N's, we may change the frequencies in the classes to proportions 
(relative frequencies) and construct polygons, as was done in Fig. 
3.1, p. 77. Somewhat clearer and more useful graphic comparisons, 
however, are possible if we construct the percentile curves of the 
distributions on the same axes. 

Let us return to the A, B, E, and G distributions of Arithmetic 
Fundamentals Test scores in Table 3.1. The frequencies, cumulative 
frequencies, and percentages are shown below, and the percentile 
curves are drawn in Fig. 4.3, p. 129. 


A B E G 
SCORE | f сом f cum %| f cumf сом %| / cum f cum %| f CUM f сом % 
51-53 35 100.0 
48-50 1 34 97.1 
45-47 3 33 94.5|1 32 100.0 
42-44 8 30 85.7| 0 31 96.9 
39-41|1 23 100.0 4 22 62.9 |1 31 96.9 
36-38 |2 22 95.7 8 18 51.4|3 30 93.8 
33-35 |9 20 87.0|2 18 100.0 10 28.6] 2 27 84.4 
30-3214 11 47.8 |5 16 88.9 | 2 7 20.0|4 25 78.1 
27-29 | 4 7 30417 11 61.1] 3 5 14.314 21 65.6 
24-26 |2 3 13.0 |0 4 22,21 2 5.7(8 17 53.1 
21-23 | 1 1 4.3 |2 4 22.210 1 2.912 14 43.8 
18-20 0 2 ТЕЛІ 1 2.912 12 37.5 
15-17 1 2 51) 4 10 31.2 
12-14 1 1 .6 2 6 18.8 
9-11 IC 12.5 
6-8 9: 3 9.4 
3-5 %.,3 9.4 
ae 


The curves permit an astonishing number of comparisons between 
the distributions. To point out a few, the interquartile ranges for 
each distribution can readily be approximated by dropping vertical 
lines from the points where the twenty-fifth and seventy-fifth per- 
centile lines intersect the curve to the scale of scores and reading 
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the distance thus demarcated. Other interpercentile ranges can be 
determined similarly. The percentage of scores in the distributions 
which fall below any given score can quickly be estimated. For 
example, roughly 95 per cent of the scores in A and G and 100 per 
cent of the scores in B fall below 38.5, which is approximately the 
median score іп E. The student is asked to make several other 
comparisons in exr. 11. 

When graphic comparisons are desired, the percentile curve is 
extremely useful, perhaps the most useful and informative of the 
various graphic devices. It permits comparisons regardless of the 
sizes of the groups and answers to questions regarding overlap 
which are accurate enough for ordinary purposes. 

It should be noted here that there are better methods of compar- 
ing groups than graphic methods, methods which are precise and 
which also take into account chance differences between groups. 
When inferences about real differences in populations are being 
made from samples, chance differences obviously have to be taken 
into_account. (See Chapter VIII.) 

ercentile Rank. We have seen that the percentile system makes 
it possible to translate percentiles to raw scores and, conversely, 
raw scores to percentiles. 

(When the score of an individual in а group is expressed as a per- 
centile, i.e., as the percentage of the group which the individual 
exceeds, the percentile commonly is called a percentile rank. Thus, 
if an individual has a percentile score of 60, we know that he exceeds 
60 per cent of the individuals with whom he is being compared, or 
that his percentile rank is 60. The term percentile rank is a suitable 
one, for it explicitly expresses the idea of position or rank on a scale 
of 100. 

The percentile rank of a given score may be quickly approximated 
from the percentile curve, as has already been seen, or it may be 
determined exactly by an arithmetic procedure.)Let us determine 
the exact percentile rank of a score of 38 in the distribution of the 
293 reading scores of Table 4.2. The score falls in the class interval 
35.5-38.5; hence, it clearly exceeds the 158 scores falling below 
that class interval. To find how many scores lie below 38 in the 
35.5-38.5 class, we assume that the 33 scores in that class are dis- 
tributed evenly over the interval and interpolate. As shown in Fig. 
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4.4, the point 38 includes 2.5/3 of the 33 or 27.5 scores in the 35.5— 
38.5 class. In all, then, there are 185.5 scores falling below the score 
38, and the percentile rank of 
| 38 is 185.5/293 х 100 or 63.3. 
AN This means simply that an in- 
in intervall dividual having a score of 38 
on the test exceeds 63.3 per 
cent of the group. Percentile 
between 35.5 i 2 
and 38 | ranks ordinarily аге reported to 
2/5 the nearest whole number. 
3— The arithmetic work in com- 
35.5 38 38.5 puting the percentile rank of any 
Score score іп а given distribution may 
Fig. 4.4. Interpolation in the be summarized in the formula 


«lass interval containing score "s 
PR = 00 [r +o иг, 


158 scores 
fall below 
35.5 


whose percentile rank is to be 
determined. (From Table 4,2.) 


(4.3) 


in which X is the score whose percentile rank is desired; 
N is, as always, the number of scores in the distribution; 
F is the cumulative frequency up to the class interval 

containing X; 

L is the lower real limit of the class interval containing X; 
f is the frequency in the class containing X; 
i is the class interval. 

The application of the formula is illustrated in Table 4.3. 

The student should note the converse relationship between per- 
centile and percentile rank. The former is a score below which a 
specified percentage falls; the latter is the percentage below a spec- 
ified score. 

Percentile Rank in Ordered Data. There are a great many 
situations in which individuals or objects are “ordered” rather than 
"measured" on some trait. For example, № students may be ranked 
from “1” highest to “N” lowest in respect to initiative. Occasion- 
ally, test results are better thought of as ordered data, rather than 
as definite points on a scale of scores. High school and college class 
tanks obviously are a special case of ordered data. 

it is sometimes useful to transmute such ranks to percentile 
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scores. When this is done, the rank order of an individual with 
respect to some trait can be directly compared with his performance 
on tests which have percentile norms. Also comparisons between 


TABLE 4.3 


PERCENTILE RANK IN A DISTRIBUTION OF SCORES OF 67 
DENTAL SCHOOL APPLICANTS ON A SPATIAL RELATIONS TEST 


SCORE if cum f APPLICATION OF FORMULA (4.3) 
40-44 6 67 To find PR of 31; 
35-39 10 61 
30-34 15 51 X= 
25-29 14 36 N= 
20-24 8 22 F- 
15-19 3 14 І = 
10-14 2 п {= 
5-9 4 9 i= 
б-4- 00 Б 5 [pr =F [36 + Č = 60.4 
4 


ranked individuals сап be made, regardless of differing numbers іп 
groups. For example, a college admissions office may change high 
school class rank to percentile class rank in order to eliminate effect 
of class size. (See Table II, Appendix B.) 


Frequency 


Rank order 


Fig. 4.5, Histogram of ordered data. 


Suppose we have a group of 12 individuals who have been ranked 
from 1 to 12 in respect to some trait. A histogram of the data, since 
1 is highest and 12 is lowest, would be constructed as shown in 
Fig. 4.5, in which the mid-points of the class intervals are the re- 
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spective ranks and the frequency in each class is 1. To find the 
percentile rank of, say, the individual who is in position 5, we need 
merely to divide 7.5, the cumulative frequency below 5, by 12 and 
express the quotient as a percentage. Thus, the percentile rank of 
the individual who is in the fifth position in a group of 12 is 7.5/12 Х 
100 or 62.5, which tells us that the individual exceeds 62.5 per cent 
of the group. 
It is left as an exercise for the student to show that 


PR = 100 (Rim (4.4) 


in which R is the serial rank whose percentile value is desired and 
N the number of individuals ranked, it being agreed that “1” 
indicates the highest ог best and “N” the lowest or poorest positions. 

When there are ties for position, formula (4.4) is applicable, pro- 
vided the average of the serial ranks tied for is assigned to each of 
the ties. To illustrate: 


Individual A is first and he is assigned the rank of 1. 
В | tie for second and third and each is assigned the 
С) rank of 2.5. 
D is fourth and he is assigned the rank of 4. 


tie for fifth, sixth, and seventh, and each is as- 
signed the rank of 6. 


G 
H is eighth and he is assigned the rank of 8. 
(And so on until all individuals have been 
ranked.) 


Uses and Limitations of Percentiles. The most frequent ap- 
plication of percentiles is in testing. Standardized tests, particularly 
those used above elementary school grades, usually report norms in 
terms of the percentile values of raw scores. If an individual ob- 
tains a score on a standardized test which has a percentile rank of 
35, we know that the individual exceeds 35 per cent of the group 
used in standardizing the test. This is straightforward and valuable 
information, provided the individual can fairly be compared with 
those in the standardizing group. 

When obtained scores or rank orders are changed to percentile 


t 


Variability of Statistical Series 135 


ranks, an individual's performances on tests or his positions in 
ordered series can be brought into comparison, regardless of the 
way tests were originally scored. Individual profiles, like that illus- 
trated in Fig. 4.6, frequently utilize percentile ranks as the common 
unit. 


5 Е = Qe 

m "4 

$3 PP. c? 45 із 5% 

5 р 5-с со of 50 cg 

oT 9 D ои £3 um 

25 85 о. Da та ED 
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PR 8 5 5 28 
100 
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30r EESTI б 
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Fig. 4.6. Profile of a dental school applicant's standing in 6 aptitudes. 


In general, the percentile system of measurement, as we have seen, 
is used in four ways. In summary these are: 


a. То describe central tendency, variability, and form of frequency 
distributions. The uses of the median and various interpercentile 
ranges in describing central tendency and variability have already 
been discussed. It will be seen later that several simple measures 
of skewness and kurtosis are based upon percentiles. 

b. To compare frequency distributions of similar scores. The cumu- 
lative percentage or percentile curve permits quantitative com- 
parisons of overlap and quantitative statements regarding dif- 
ferences not possible by any other graphic method. 
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с. To express individual standing in a group in terms of percentage 
of individuals in the group which the individual exceeds, Re- 
gardless of the shape of the distribution or possible varying values 
of the score units on the scale, a percentile rank has a straight- 
forward and easily understood meaning. 

d. To reduce to a common base the measures and ratings of an 
individual in two or more abilities. We shall return to this a little 
later in connection with standard scores. 


The percentile method of measurement has two notable weak- 
nesses. In the first place, percentile ranks are not subject to alge- 
braic treatment, and hence cannot logically be used when two or 
more scores are to be combined into a composite score. For example, 
there is no way to “average” the 6 percentile scores on the individual 
profile of Fig. 4.6. This weakness, however, does not tend to be of 
any very great importance in practical work. A composite score in 
most situations eliminates the very details which need to be taken 
into account. An individual's strength in one prerequisite ability 
can rarely be considered to counterbalance his weakness in a second. 

In the second place, the units of a percentile scale are not in pro- 
portional relationship to the units on the scale of scores. Consider 
the curve in Fig. 4.2. The interval between the ninetieth and 
hundredth percentiles corresponds to a reading score interval from 
about 46 to 59. In other words, 10 percentile units at the top of the 
distribution correspond roughly to 13 raw score units, The interval 
between the fiftieth and sixtieth percentiles corresponds to a raw 
score interval from about 35 to 37. Thus, near the middle of the 
distribution, 10 percentile units correspond roughly to 2 raw score 
units. We may note, however, that from about the tenth to the 
ninetieth percentiles, fairly constant proportionality is obtained, 

No general statement can be made about the extent of nonpro- 
portionality in the percentile scale and the consequences of ob- 
scuring through its use relatively great differences near the top and 
bottom of a distribution, These depend upon the shape of the dis- 
tribution and the uses to which the percentile scores are to be put. 
Obviously, if the scale of raw scores is accurate and trustworthy 
throughout and the distribution is more or less bell-shaped, the 
use of percentile measures introduces error. However, raw score 
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scales are frequently inaccurate, particularly at the extremes. When 
this is the case, the percentile system, which makes only the modest 
assumption that the scores are ordered, is appropriate. The student 
will find extended discussion of the application of percentiles in 
educational measurement in Ref. 2. 


10, 


11. 


13. 


Exercises 


. Determine arithmetically the seventy-fifth and twenty-fifth percentiles 


of the distribution in Table 4.2 and compare the values with those ob- 
tained from the percentile curve in Fig. 4.2. 


Referring to Fig. 4.2, 


. What is the median of the distribution? 

. What score is exceeded by 95 per cent of the group? 

. About what per cent of the group exceed a score of 25? 

. About what per cent of the group fall below a score of 43? 


ос» 


Referring to Fig. 4.3, 


a. Is it possible to tell roughly which of the four distributions is the 
most variable? Explain. 

b. What do irregularities in the curves indicate? 

c. If a moving average of the frequencies in distribution B were taken, 
how would the percentile curve be affected? 

d. What per cent of the B distribution is above the median of G? 

e. What per cent of the G distribution is below the first quartile of Е? 

f. A student having a percentile rank in School E of 30 would have 
about what percentile rank in A? In B? In G? 

g. Estimate Q, D, and Pos — P; for each of the four distributions. 


. Construct percentile curves of the distributions of mental ages of the 


sixth-, seventh-, and eighth-grade students of exr. 21, p. 63, on the 
same axes. About what percentage of students in grade 7 fall below 
the median in grade 6? About what percentage of students in grade 7 
lie above the median in grade 8? Do these data roughly support the 
rather common belief that in a given elementary grade about 1/3 of 
the pupils are below the average of the grade just below in scholastic 
ability and about 1/3 above the average of the grade just above, 
assuming that mental ages are trustworthy measures of scholastic 
ability? 

А etudent has a percentile score of 35 on a test. Does this mean that 
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he is exceeded by 35 per cent of the group, that he exceeds 35 per cent 
of the group, or that he equals 35 per cent of the group? 

14. A student has a percentile rank of 18. Why cannot it be said that he is 
in the lowest quartile? 

15. What is indicated by the relatively steep portion of a percentile curve? 
The relatively flat portion? 

16, Many school grading systems are based upon 100 with, say, 65 as the 
passing score. What is the difference between a percentage grade in this 
system and a percentile rank? 

17. Two groups of students were given reading tests. For group A the 
median was 42. For group B, the median was 48. If a student has a 
percentile rank of 60 in group A, can it be concluded that he is equal 
in reading ability to a student who has a percentile rank of 60 in group 
B? What assumption is made in interpreting percentile scores? 

18. Suppose 8 specimens of handwriting are arranged in order of excellence 
from 1 or best to 8 or poorest. What is the percentile rank of each 
specimen? 

19. Given the information that in a distribution of salaries Ро = $2,000, 
Р» = $2,250, Ро = $2,450, Р. = $3,200, and Poo = $5,250, sketch 
the distribution. 

20. What are several advantages and limitations of the percentile system 
of measurement? 

21. Determine either arithmetically or from the percentile curves in Fig. 
4.3 the percentage of scores falling in the interval froni Mdn — Q to 
Mdn + Qin the A, the В, the Е, and the С distributions of Arithmetic 
Fundamentals Test scores. 

22. The percentile norms for the standardized reading test used in obtaining 
the distribution shown in Table 4.2 and again in Fig. 4.2 are as follows: 
Р» = 25, Pio = 33, Pos = 38, Ри = 44, Р = 52, Poo = 57, Pos = 64. 
How do the percentiles in the observed distribution compare with the 
norms? In comparing the observed distribution with the norms, what 
assumption do we make? 


The Average Deviation 


Let us return to measures of variability. The quartile deviation 
and other interpercentile range measures which have been discussed 
do not take into account the variation of the individual items in a 
series. When it is desired to consider all of the fluctuations in value 
which characterize a series, such measures cannot be used. 

The simplest method of taking into account the variation of all 
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the items in a series is that of finding their average deviation from 
a selected value, usually a point of central tendency. Either the 
mode, median, or the mean may be selected as the point of central 
tendency; since the mean ordinarily is used, however, we shall limit 
our discussion to the average deviation from the mean. 

The 18 IQ's of School B, Table I, Appendix B, are listed below, 
and the deviation of each from the mean value 93.61 is shown. 


DEVIATION DEVIATION DEVIATION 
1Q FROM MEAN 1Q FROM MEAN 10 FROM MEAN 
85 — 8.61 88 — 5.61 81 --12.61 
84 - 9:61 82 —11.61 94 .39 
91 — 2.61 105 11.39 114 20.39 
100 6.39 102 8.39 88 — 5.61 
81 —12.61 106 12.39 87 — 6.61 
106 12.39 101 7.39 90 — 3, 61 


The algebraic sum of the deviations is, of course, zero (within 
rounding tolerance), but if we disregard signs, their sum is 158.22 
and their arithmetic mean is 158.22/18 or 8.79, to three-figure ac- 
curacy. The quantity 8.79 is the average of the absolute deviations 
of the ТО’з from their mean. 

When X represents the scores іп a series, x is conventionally used 
to represent the deviations of the scores from their mean, i.e., 
X — M =z, the mean always being subtracted from score. Using 
this notation, the average deviation from the mean is defined by 


_ ZIX - M| _ #8, 4.8) 


in which the symbol || denotes absolute values and tells us that the 
algebraic signs are to be disregarded in the summation. The symbol 
AD is generally understood to indicate average deviation from the 
mean. If the median or some other point is selected from which to 
take deviations, the fact must be reported. The average deviation 
is also referred to as the mean deviation. 

Average Deviation in Grouped Data. When data are grouped 
and the items are assumed to have the mid-values of their respective 
class intervals, the average deviation, as defined in formula (4.5) 


v 
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becomes 


Й 

Ap = ЭШ, (4.6) 

in which f is the frequency in a class and z' is the deviation of the 
class mid-point from the mean. 

Let us group the 18 IQ's of the preceding discussion, as shown in 
Table 4.1, and compute the average deviation of the distribution. 
The mean of the distribution is 93.94, so that the deviations of the 
class mid-points are 112 — 93.94 or 18.06, 107 — 93.94 or 13.06, 
and so on, as shown in the 2” column below. Since the sum of the 
fe’ column, disregarding signs, is 162.76, the average deviation of 
the grouped IQ's is 162.76/18 or 9.04. Owing to errors of grouping, 
this differs slightly from the average deviation of the ungrouped 


IQ's. 


CLASS 

CLASS 7 MID-POINT a’! y. 
110-114 1 112 +18.06 18.06 
105-109 3 107 +13.06 39.18 
100-104 3 102 + 8.06 24.18 

95-99 0 97 + 3.06 

‚ 90-94 3 92 © — 1.94 - 5.82 
85-89 4 87 - 6.94 —21.16 
80-84 4 82 —11.94 -47.76 
М-ів X|fz'| = 162.76 


To find the average deviation of a grouped series we need only to 


a. Find the deviations of the mid-points of the class intervals from 
the mean. 

b. Multiply each deviation by the corresponding class frequency. 

с. Add the products, disregarding signs. 

4. Divide the sum by №. 


There are several so-called shorter methods of finding the average 
deviation, but all are indirect and tend to make the beginning stu- 
dent lose sight of the meaning and simplicity of the statistic. 

Uses and Limitations of the Average Deviation. The average 
deviation is the simplest, measure of variability available that takes 
into account the fluctuations of all the items in a series. Tt is the 
most meaningful measure to the person untrained in statistics, The 
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concept of the average of all deviations from the mean of a series 
is entirely intelligible as a measure of variability. 
The average deviations of the IQ distributions in Table 4.1 are: 


School B 9.04 
School C 11.40 
School G 12.49 
School J 7.50 


The evidence that the IQ's in the С series deviate on the average 
more from their mean than the IQ's in the В, C, and J series is 
simple and clear. We know that the G series is characterized by 
more variability, in the sense of average deviation from the mean 
1Q, than the others. 

The average deviation has two noteworthy limitations. Since it 
is based upon all of the deviations, it may be inflated by asingle 
extreme value. If one of the 10/78 in the 90-95 class of the B series 
had been, say, 145, the average deviation would have been about 
12 instead of 9.04. When a series is long, however, and not highly 
irregular at the extremes, this fact is of little moment. Moreover, 
extreme values inflate the average deviation somewhat less than 
other measures which take into account all deviations. 

The second limitation is of much greater consequence and ac- 
counts for the rather infrequent use of the average deviation. As 
has been seen, the signs of the deviations from the mean must be 
ignored in finding the average deviation. While the disregard of 
signs is entirely sensible, since negative deviations have the same 
influence upon amount of variation as positive, it results in a non- 
algebraic quantity. Consequently, the average deviation is unwieldy 
in mathematical operations and has very limited use in statistical 
theory. Except for a minor role in certain sampling problems (Ref. 
4), it is currently used only as a descriptive measure of variability. 
As such it has genuine merit. 


Exercises 


23. What does the word average refer to in the term average deviation? 

24. Compute the average deviation of the IQ's in the School J distribution 
of Table 4.1. 

25. How would you compute the average deviation from the median for 
grouped data? 


g 
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26. The median is sometimes defined as the value in a series from which the 
sum of the absolute deviations is a minimum. (For proof, see Ref. 1, 
p. 143.) Compare the sum of the absolute deviations from the median 
to that from the mean in the School J distribution of Table 4.1. 


The Standard Deviation 


Of the several measures of variability, the standard deviation is 
by far the most used and important. It and its close relatives, the 
"sum of squares” and the “variance,” occupy a central position 
in statistical theory. 

"There is no satisfactory way of describing the standard deviation 
other than by stating the operations by which it is calculated. Like 
the average deviation, it is based upon the deviations of all values 
in a series. In its calculation, however, the signs of the deviations 
are not disregarded. Instead, the negative signs are eliminated by 
squaring each deviation. After the deviations are squared, the 
Squares are summed, divided by N, and the square root of the 
quotient is extracted, the final operation translating the quantity 
back to the linear unit of measurement, 

The operations in finding the standard deviation of a series thus 
include 


. Finding the deviation of each value from the mean. 
. Squaring the deviations, 

c. Summing the squares, 

d. Dividing the sum by N. 

. Extracting the square root of the quotient. 


Е р 


e 


То illustrate the procedure, let us find the standard deviation ofa 
simple series. 


DEVIATION FROM MEAN. 


SCORE DEVIATION FROM MEAN SQUARED 
X Х - More та 
22 -1 1 
20 -3 9 
25 2 4 
30 7 49 
18 M =o 25 
EX = 115 22-0 Ха! = 88 
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The standard deviation of the series, in which N = 5, is the square 
root of 88/5. Thus, S.D. = 4/17.6 or about 4.2. 
When we represent the deviation of an item from the mean of its 
series by z, we may define the standard deviation: 
22° 


8р. = у (4.1) 


If we square both members of (4.7) we have (S.D.)* = 22?/N. The 
square of the standard deviation is designated the variance, and 
the quantity Ez? is designated the sum of squares. These are tech- 
nical terms which, in statistics, are always defined as above. In 
other words, the sum of squares of a series is the sum of the deviations 
(from the mean) squared; the variance is the sum of squares divided 
by N; and the standard deviation is the square root of the variance. 
In the example above, the sum of squares is 88 and the variance is 
17.6. In a later chapter we shall find that the sum of squares and the 
variance are extremely useful in analyzing variability in some types 
of problems. 

We digress here for a moment to consider somewhat confusing 
conventions respecting notation. Some writers use the symbol s to 
denote the standard deviation of a given series; some use the Greek 
symbol с (sigma). In modern statistical theory, it is the preferred 
practice to use о to denote the unknown standard deviation in the 
population and s to denote an estimate of а, based on sample evi- 
dence. Now it can be shown that the best estimate of с results from 
multiplying the standard deviation of the sample by the factor 

N 


ТЕШИ а 
NS il 1.e., 
E 


Thus, S.D., s, and с do not mean the same thing in statistical theory. 
It would be preferable to use 5.0. and only S.D. to denote the 
standard deviation of a given series. However, the symbol S.D. is 
clumsy, particularly when a subscript or superscript, e.g., 8... or 
5.0.2, is needed. We shall therefore generally use с to denote the 
standard deviation in the sample. In referring to the standard devia- 
tion in the population we shall use a caret over c; thus, 4 will denote 
the standard deviation in the population. 


s = S.D. 
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Computation of the Standard Deviation of Ungrouped 
Data. This has already been illustrated for a fictitious series and is 
illustrated again in the left-hand side of Table 4.4 for the series of 
18 І0%. The mean of the IQ's is 93.61, and the deviations and their 
squares are as shown. The computation tends to be laborious when 
the mean is not integral, as in the present case, and an alternative 


TABLE 4.4 


COMPUTATION OF STANDARD DEVIATION BY DEVIATION 
SCORE AND RAW SCORE METHODS 
(Data from school B, table I, appendix B) 


DEVIATION SCORE METHOD RAW SCORE METHOD 
X * zi я 
10 (IQ — Mean IQ) х An 
85 — 8.61 74.1321 85 7,225 
84 — 9.61 92.3521 84 1,056 
91 — 2.61 6.8121 91 8,281 
100 6.39 40.8321 100 10,000 
81 —12.61 159.0121 81 6,561 
106 12.39 153.5121 106 11,236 
88 — 5.61 31.4721 88 7,744 
82 -11.61 134.7921 2 6,724 
105 11.39 129.7321 105 11,025 
102 8.39 70.3921 102 10,404 
106 12.39 153.5121 106 11,236 
101 1.39 54.6121 101 10,201 
81 —12.61 159.0121 81 6,561 
94 .39 .1521 94 8,836 
114 . 20.39 415.7521 114 12,996 
88 — 5.61 31.4721 88 7,744 
87 - 6.61 43.6921 87 7,569 
90 — 3.61 13.0321 90 8,100 
SUM 0.02  1,764.2778 1,685 159,199 
By formula (4.7), since N — 18, By formula (4.8) 
>= 
S.D. =т= шш т = 1/18 \/18(159,199) = (1,685)? 
= 4/98.01543 = 1/18 V31,757 


= 9.90 = 9.90 
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method utilizing only raw scores usually is preferable. The latter 
method is based upon a formula derived from definition (АЛ). 

E 2 
Substituting X — M for т in (4.7), we have o = (260 


from which, as is shown in Appendix A, we obtain the formula 
c = MNA/NZX? — (ХХ), (4.8) 


in which УХ? is the sum of the squares of the raw scores and (ХХ)? 
is the square of the sum of the raw scores. The use of formula (4.8) 
is illustrated in the right-hand side of Table 4.4, the squares of the 
IQ's being obtained from the list of squares of numbers included in 
Table I or Table J, Appendix C. 

It often is possible to save computational labor by subtracting a 
constant from each raw score. In the above example, if 80 is sub- 
tracted from each IQ a great deal of work will be saved. It is left as 
an exercise for the student to show that the standard deviation is 
not affected by changing each value in a series by a constant 
amount. 

It will be noted that, in the above illustrations, more digits are re- 
tained in the squaring and summing procedures than the data war- 
rant. In a later section we shall consider the question of accuracy in 
computing the standard deviation. 

Computation of the Standard Deviation of Grouped Data. 
In finding the standard deviation of a series which contains more 
than about 30 items, it is possible to save labor by grouping. After 
a series is grouped, the items being assumed to have the mid-values 
of their respective class intervals, the most straightforward method 
of finding the standard deviation is by: 


a. Squaring the deviations (from the mean) of the mid-points of the 
class intervals. 

b. Multiplying each squared deviation by the corresponding class 
frequency f. 

c. Summing the products. 

d. Dividing the sum of the products by N. 

e. Extracting the square root of the quotient. 
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The operations may be summarized in the formula, analogous to 


(4.7), 


in which f is the frequency in a class and z' is the deviation of the 
class mid-point from the mean. The computation is illustrated in 
Table 4.5. 


TABLE 4.5 


COMPUTATION OF STANDARD DEVIATION OF GROUPED DATA 
(Data from table 4.1, school C, mean IQ, 93.32) 


10 CLASS DEVIATION OF CLASS 
CLASS f MID-POINT MID-POINT FROM MEAN г! eu fan 
130-134 1 132 38.68 1,496. 142 1,496.1424 
125-129 0 127 33.68 1,134. 
120-124 1 122 28.68 822. 822.5424 
115-119. 2 117 23.68 560. 1,121.4848 
110-114 2 112 18.68 348.942 697.8848 
105-109 2 107 13.68 187. 374.2848 
100-104 3 102 8.68 75. 226.0272 
95-99 3 97 3.68 13.542 40.6272 
90-94 4 92 — 1.32 1. 6.9696 
85-89 11 87 — 6.32 39. 439.3664 
80-84 3 82 --11:32 128. 384.4272 
75-79 4 77 —16.32 266. 1,065.3696 
70-74 2 72 —21.32 454. 909.0848 
suM 38 7,584, 2112 
oo Е 7,584. 205 
= у тет 
= 1413 


Although the method illustrated in Table 4.5 may always be used 
in finding the standard deviation of grouped data, and is the one 
to use if class intervals are not of uniform size, there is an easier 
method appropriate for most distributions. When class intervals 
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are equal, as is usually the case, the deviations of the mid-points 
from an arbitrary origin may be coded in class interval steps or d 
units and the squares of the deviations times their respective fre- 
quencies computed in the d unit. The method is most easily under- 
stood with reference to a specific distribution. 

Consider the distribution and the several columns shown in 
Table 4.6. The f and fd columns are completed as in finding the 
mean by the coded method. The entries in the fd? column are the 
products of the fd's times the corresponding 4 6. After the sums at the 
foot of Table 4.6 are obtained, they are substituted in the formula 
(see Appendix A for derivation) 


A (34), (4.9) 


TABLE 4.6 
COMPUTATION OF STANDARD DEVIATION OF GROUPED DATA 
BY CODED METHOD 
(Data from table 4.1, school C) 


1Q if d fd Ја? 
130-134 1 8 8 64 
125-129 0 T 
120-124 1 6 6 36 
115-19 2 5 10 50 
110-114 2 4 8 32 
105-109 2 3 6 18 
100-104 3 2 6 12 
95-99 3 1 3 3 
90-94 4 0 

85-89 11 —1 -11 11 
80-84 3 =2 — 6 12 
75-19 4 —3 —12 36 
70-74 2 —4 — 8 32 
SUM 38 10 306 


esL. v x 


= 5 у/7.983 
= 14.13 
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in which i is the class interval, and the other quantities refer to 
sums obtained as those in Table 4.6. The application of the formula 
is illustrated in the space below the table. 

Let us summarize the coded method of calculating the standard 
deviation in the following steps: 


a. After the data are grouped in a distribution having equal class 
intervals, an arbitrary origin is selected, and the class mid- 
points are coded in the d unit. Work is usually saved if the origin 
is selected near the middle of the distribution. 

b. An fd column is completed, as in finding the mean by the coded 
method. 

с. An fd? column is completed Бу multiplying the entries in the fd 
column by the corresponding d entries. 

„d. (Zfd/N)? is found by dividing the algebraic sum of the fd column 
by N and squaring the quotient. 

e. Xfd*/N is found by dividing the sum of the fd? column by №. 

f. The square root of the difference Efd?/N — (ZXfd/N)' is taken. 

g. The resulting root is multiplied by i, the value of the class 
interval. 


И desired, formula (4.9) may be written 
c= ў МХМЗ/Ф = СМ, (4.10) 


а form particularly appropriate for use with a computing machine. 
Applying formula (4.10) to the data of Table 4.6, 


с = = +/38(306) — (10): 


eee 
= 38 V/11528 
= 14.13. 


The coded method is easy to apply and greatly reduces computa- 
tional labor. The standard deviation of the long series of Table 4.7 
is determined with little more effort than that of much shorter 
series. The student should practice the method until he becomes 
adept in its use. ы 


NT 
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TABLE 4.7 


COMPUTATION OF STANDARD DEVIATION OF GROUPED DATA 
BY CODED METHOD 
(Data from table 4.2) 


SUM 293 54 2,552 


By formula (4.9), с = 3 
= 8.84 


Sheppard’s Correction for Errors of Grouping. When the 
standard deviation is computed from grouped scores, the scores are 
assumed to lie at the mid-points of their respective class intervals. 
It is generally the case that the scores actually are scattered over 
the intervals, and, in the bell-shaped distribution, more scores tend 
to lie in the halves of the intervals which are nearer the mean of the 
distribution than in the farther halves. Hence, assigning the mid- 
points of the respective intervals to the scores in the various classes 
will generally result in a slight over-all exaggeration of the devia- 
tions of the scores from the mean of the distribution. The excesses 
do not systematically affect the value of the mean (why?), but, since 
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the deviations are squared in computing the standard deviation, 
the standard deviation will generally be somewhat too large. 

Sheppard (Ref. 6, p. 72) devised a correction for the slight infla- 
tion of the standard deviation computed from grouped, normally 
distributed data, and the correction generally results in a more 
accurate value in any distribution which approximates normality, 
The application of the correction results in the following formulas, 
in which с, means the corrected standard deviation, 


(4.11) 


к= Ё УМЕ — (GWy 20892. (4.12) 


Since the formulas are quite like formulas (4.9) and (4.10), no illus- 
tration of their use is needed. The correction can be applied after 
the standard deviation has been computed by the formula 


в, = ма? — 0830. (4.13) 


Tf there are more than about 10 classes, Sheppard's correction 
tends to be negligible. Moreover, the correction constitutes a re- 
finement which may be quite inappropriate in view of the kind of 
data and distributions we ordinarily have in educational appraisal 
and research. 

Combining Standard Deviations. Occasionally it is useful to 
be able to determine the standard deviation of a total group of 
scores, given the means and standard deviations of two or more 
subgroups making up the total. Since standard deviations are 
algebraic quantities, they can, of course, be manipulated alge- 
braically. Consider the general case in which the numbers, means, 
and standard deviations of k subgroups are: 


Subgroup 1 № М, 91 
Subgroup 2 № М» ба 


Subgroup К № М, бұ 
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Let us represent the mean and the standard deviation of the total 
group by М, and c; respectively. The mean М, may be determined 
by an easy extension of formula (3.6), p. 97, and the standard 
deviation by the formula 


3 + м®+ е + Ир + ЕМО MD oan 
Nit Na+ ЕМ 55 


(4.14) 


To illustrate the use of the formula, we return to the distribu- 
tions of Table 4.1. The means and standard deviations of the grouped. 
intelligence quotients are brought together below. 


STANDARD 
SCHOOL NUMBER MEAN DEVIATION 
B. 18 93.94 10.02 
с 38 93.32 14.13 
G 32 91.22 15.96 
J 29 89.41 9.70 


By extension of formula (3.6) 


_ 18 X 93.94 + 38 х 93.32 + 32 X 91.22 + 29 X 89.41 


18 F38 3-32 + 29 poe 


М, 


Substituting the given means and standard deviations in formula 
(4.14), we have 


ee + 93.942) + 38(14.13? + 93.32?) + 32 (15.96? 
et 


+ 91.22?) + 29(9.70° + 89.41?) _ 987° 


18 + 38 + 32 + 29 


so that с, = 13.29. 

The student can verify that when the four distributions in Table 
4.1 are combined, the mean and standard deviation of the combined. 
distributions, determined directly, are 91.87 and 13.28, the dis- 
crepancy between the two с values being due to rounding.. 

The methods of finding the mean and standard deviation of a 
total group from those of subgroups are useful chiefly in two situa- 
tions: (1) when it is desired to determine the total mean and stand- 
ard deviation when all that remains of the original data are the 
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numbers, means, and standard deviations of subgroups, and (2) 
when it is desired to determine the effect of new groups upon old 
means and standard deviations without making up new distribu- 
tions; for example, when it is desired to revise test norms. 

It should be noted again that one of the great advantages the 
mean and standard deviation have over other measures of central 
tendency and variability lies in their algebraic nature. 

Uses and Limitations of the Standard Deviation. The stand- 
ard deviations of the distributions of Table 4.1, listed above, indicate 
that the IQ's of the G distribution are the most variable and those 
of the J distribution the least variable about the respective means. 
As a descriptive measure of variability, the standard deviation is 
always interpreted in this way, i.e., the greater its value, the more 
the scores scatter on an average from their mean. 

Tf we lay off distances equal to lø above and below the mean of a 
normal distribution, the interval includes 68.3 per cent of the scores 
or items; the interval +2е includes 95.4 per cent; and the interval 
+30 includes 99.7 per cent. These percentages, as well as those 
included by +.5о, 1.50, and +2.5о, are shown in Fig. 4.7. The 
fact that с intervals include fixed percentages of items in any normal 
distribution is of fundamental importance in statistical theory. 
The proportions of area included by successive .01о distances from 
the mean are shown in Table A, Appendix C. Since the area under 
the polygon of the normal distribution corresponds to the total 
number of items, N, of the distribution, the proportions of Table A 
make it possible to determine the number of items falling within 
any c-unit interval. 

One of the most interesting and useful applications of the stand- 
ard deviation in the normal distribution relates to determining the 
chances that a random score or statistic will not deviate from its 
expected value by more than some specified amount. To illustrate, 
if we select a score at random from a normal distribution, the 
chances are about 95 to 100 that it will not deviate from the mean 
by more than + 2с. This is true because only about 5 per cent of the 
scores in a normal distribution lie outside the range +2c¢ from the 
mean. In later chapters we shall find that this application of the 
standard deviation is fundamental in determining the reliability 
of statistical predictions and inferences. 


Variability of Statistical Series 153 


While no such exact relationships exist in nonnormal distributions, 
there is a surprising tendency of с intervals to include fairly stable 
percentages of scores in distributions which show substantial de- 
parture from normality. This tendency is demonstrated in Table 
4.8 for 11 quite nonnormal and dissimilar distributions. 


-L 1 1 —— і 


--.- 1 
-3.00 -2.50 -2.00 -1.50 -1.00 -0.5g М=0 0.50 1.09 1.50 2.00 2.50 3.00 


Fig. 4.7. Percentages of items included within 4.50, +1.00, +1.50, 
+2.00, +2.50, апа +3.0с of the arithmetic mean in a normal dis- 
tribution. 


The percentages in the table were computed by finding the per- 
centile values of the end points of the intervals M + le, М + 20, 
and М + 3e for each distribution and making the appropriate 
subtractions. To illustrate, in the A distribution, М — lo = 27.5. 
Referring to Table 3.1, the percentile value of 27.5 is 18.8. M + lo 
= 35.9, and 35.9 has a percentile value of 88.1. Hence, 88.1 — 18.8 
or 69.3 per cent of the scores in the A distribution lie in the in- 
terval М + lo. The other percentages in the table were similarly 
calculated. 

Standard deviation intervals include invariant percentages of 
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items only in the normal distribution. However, unless observational 
data are characterized by little or no “piling пр” somewhere be- 
tween extreme values, there is a marked tendency for about 2/3 
of them to fall in the interval M + le, about 95 per cent to fall 
in the interval М + 2с, and for all to fall in the interval M + 3c. 
The tendency is seen again in Table 4.9. It will be noted that the 
first two percentages in the B distribution are quite different from 
the others. The reason for this is revealed by inspection of that 
distribution in Table 4.1. 


TABLE 4.9 
PERCENTAGES INCLUDED IN UNIT STANDARD 
DEVIATION INTERVALS 
(Distributions from table 4.1) 


DISTRIBUTION 
: B C G J 
IN М = 93.94 М = 93.32 М = 91.22 М = 89.41 
в = 10.02 жуг в = 15.96 c= 9.70 
le below mean to le 56.3% 66.9% 67.8% 67.6% 
above mean 
20 below mean to 20 99.4% 95.8% 93.8% 95.9% 
above mean 
3e below mean to 3e 100.0% 100.0% 100.0% 99.4% 


above mean 


The standard deviation, as an algebraic quantity, has many uses) ^ 


denied other measures of variability. As our study of statistical: 
methods proceeds, we shall find that it deserves its place as the 
“ master” measure of variability. It not only is the most trustworthy 
of the several measures, as a rule, but is indispensable in correla- 
tional work, test and test item analysis, and in judging the reli- 
ability of statistical predictions and inferences. These uses will be 
taken up later. 

'The standard deviation has an interesting property, one which is 
highly important in statistical theory. In any series, the sum of 
squares of deviations of the items from their arithmetic mean is less 


v 
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than the sum of squares of the items from any other value. This is 
demonstrated in Appendix A. As a consequence, the standard devia- 
tion is less than any similar “root-mean-square.” 

v The standard deviation has two distinct limitations. It is ex- 
tremely difficult to interpret to those untrained in statistics, and 
as a rule should not be used in describing statistical data to the 
untrained: Its second limitation arises from its sensitivity to extreme 
values. By studying several simple series, the student can easily 
satisfy himself that the standard deviation is more affected by 
extreme values than other measures, with the exception of the over- 
all range. Hence it is of doubtful propriety for series containing a 
few extreme values relative to the majority. 

Like the mean, the standard deviation is a member of the moments 
system (see p. 180), and consequently teams naturally with the 
mean. When the one is an appropriate measure of central tendency, 
the other ordinarily is an appropriate measure of variability. 


Exercises 


27. In what sense is the standard deviation an average of deviations from 
the mean? 

28. The sum of squares of a series of 36 items is 1,296. What are the variance 
and standard deviation? 

29. Find the average deviation and the standard deviation of the two 
series below. Which is more affected by the extreme score in the second 


series? 
30 30 
30 30 
32 32 
33 33 
34 34 
35 35 
35 35 
35 35 
36 36 
40 60 


30. What is the effect upon the standard deyiation of adding or subtracting 
a constant to or from each score in a series? Of multiplying each score 
by a constant? (Experiment with several simple series, or devise a 
general demonstration.) 
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31. 


33. 


34. 


36, 


37. 


The standard deviations of IQ's in two classes аге 16.0 and 8.0, re- 
spectively. The mean in each class is 110. Which group would you 
rather teach? Why? 


2. Verify that the mean and standard deviation of the distribution formed 


by combining the four distributions of Table 4.1 are 91.87 and 13.28, 
respectively. 

Why is the standard deviation not an appropriate measure of the vari- 
ability of the distribution of salaries given in exr. 12, Chapter III? 

In a small high school, a standardized English usage test was given 
to the upper three classes, with the following results: sophomores, 
М = 92, М = 41.0, с = 12.0; juniors, М = 80, М = 48.0, о = 9.0; 
seniors, N = 75, М = 45.0, в = 10.0. Find the mean М, and the 
standard deviation о; for the three classes combined. Interpret these. 


. The scores of 67 dental school applicants on the Miller Survey of Me- 


chanical Insight are shown below. 


SCORE f 
32-35 4 
28-31 3 
24—27 12 
20—28 10 
16-19 16 
12-15 11 

8-1 6 

4-7 Б] 

0-3 2 


a. What are the mean and standard deviation of the distribution? 

b. What is the percentile value of the point М — lo? Of the point 
М + 1? What percentage of scores falls in the interval M + lo? 

с. What percentage of scores falls іп the interval М + 20? 

d. What percentage of scores falls in the interval M + Зе? 

e. What is the corrected standard deviation c, of the scores? 


(a) The mean of an approximately normal distribution of test scores 
is 37.5 and ø is 7.5. If a student made a score of 60, how many 078 
above the mean would his score be? In what sense is the score excep- 
tional? (b) The mean of a second distribution is 37.5 and c is 15.0. 
Isa score of 60 in this distribution as exceptional as in the first? Explain. 
In a normal distribution of 1,000 scores, how many scores are within 
and how many outside each of the following intervals: + „5с from the 
mean, +1.00, +2.50, £3.00. 
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38. Suppose you were called upon to explain the meaning of the standard 
deviation to a group untrained in statistics. What would you say? 


The Coefficient of Variation 


The several measures of variability we have considered are 
denominate, for they are expressed in the units of the original values. 
The standard deviation of a series of heights measured in inches, 
for example, is in the inch unit. The standard deviation of test 
scores is in the same unit as the scores. When two series are expressed 
in the same unit and have approximately equal average value, their 
variability can be compared directly by use of Q, AD, or с, as was 
done in comparing the variability of the IQ's in Table 4.1. 

It is sometimes desirable to compare the variability of two series 
which are expressed in dissimilar units or which have quite different 
average values. To do this we need а measure of variability which 
is independent of unit and which takes averages into account. 

"There are several abstract measures of variability, the most com- 
mon of which is the coefficient of variation, also known as the relative 
slandard deviation. This is defined as the standard deviation divided 
by the mean and is ordinarily expressed as a percentage, i.e., 

100 xc 

СҮ = и 
Since CV is the quotient of two quantities having the same unit, it 
is independent of unit. Since it expresses the standard deviation as 
a percentage of the mean, it provides a measure of variability rela- 
tive to average value. The importance of this will be seen a little 
later. Because it is highly susceptible to misinterpretation, the coeffi- 


(4.15) 


cient of variation is not in good repute in educational research. 


When it is interpreted cautiously, however, it is an informative 
statistic and one for which there is no substitute. Let us consider 
several applications of the coefficient. 

Comparing Variabilities When Means Are Unequal. It is 
usually the case that a comparison of two series expressed in the 
same unit but having unequal means can be made more fairly in 
relative than in absolute terms. It is perhaps in the nature of things 


that a series of large values tends to have greater deviations from. 


the mean than a series of small values; at any rate, it is observable 
that the former tends to vary more than the latter. 
Consider the means and standard deviations of the heights of a 
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large group of 4-year-old and of 14-year-old girls, listed below: 


GROUP M c 
4-year-olds 40.0 in. 1.5 in. 
14-year-olds 62.6 in. 2.2 in. 


For the 4-year-olds, CV = 1.5 х 100/40.0 or about 3.8 per cent; 
for the 14-year-olds, CV = 2.2 X 100/62.6 or about 3.5 per cent. 
Hence, relative {о their own means, the two groups show approxi- 
mately the same variation. Based upon absolute units, the varia- 
bility compares as 1.5 to 2.2 or as about 2 to 3. The former com- 
parison is the fairer and more informative one. 

As another example, suppose a test of reasoning skill were given 
to a group of seventh-graders and to a group of twelfth-graders and. 
that the means and standard deviations were: 


GROUP M c 
Seventh-graders 18.0 4.0 
Twelfth-graders 50.0 11.0 


For both groups, CV is about 22 per cent. Hence, the two groups: 
may be said to be similar in variation relative lo their means. 

The latter comparison is the sort which has led to the disparage- 
ment of the coefficient of variation in education measurement. It 
can be argued that the comparison is entirely arbitrary. If the rea- 
soning test in our example had contained an additional 10 very easy 
items which all in both groups could work correctly, the coefficients 
would have been 4 Х 100/28 or about 14 per cent and 11 x 100/60 
or about 18 per cent, respectively, since adding 10 to each score 
would affect the means but not the standard deviations. 

Although true, the argument is impressive only in the sense that 
it emphasizes the necessity to interpret educational test measures 
with caution. When there is no known zero point on a scale of meas- 
urement, as is generally the case in educational testing, comparisons 
of any sort whatsoever have to be made and reported with the 
greatest of care. To say that the seventh- and twelfth-graders of the 
above example are equally variable in reasoning when the means 
are taken into account, without reference to the test and the situa- 
tion, would be to make a meaningless statement. For a given educa- 
tional test as constructed, however, it may be of value to know 
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whether various age or other kinds of groups are equal or unequal 
in variability of performance relative to their own means. Only the 
coefficient of variation or some similar ratio of variability to average 
value can provide such information. 

Comparing Variabilities When Units Are Unlike. Not only 
does the coefficient of variation provide a fair comparison of similar 
series having unequal means, it also makes possible comparison of 
series expressed in dissimilar units. Suppose we have given the means 
and standard deviations of the weights and heights of a group: 


VARIABLE M с 
Height 68.8 in. 2.5 in. 
Weight 141.5 Ibs. 17.5 lbs. 


and we want to know whether the group is more variable in weight 
than in height. For height, CV is about 3.6 per cent, and for weight 
about 12 per cent. Hence, variability in weight in the group is about 
314 times as great as that in height, when the two are stated as 
percentages of their means. 

By the same procedure we might determine whether teachers are 
more variable in respect to years of experience than in respect to 
salary, or whether students are more variable, say, in respect to 
speed of reading than in speed of typing. There are a great many 
situations in which the variability of dissimilar series can profitably 
be compared, although the student will find that the correlational 
procedures to be described in a later chapter usually can be made 
to provide more useful information in these situations than the 
coefficient of variation. 

Exceptional Variation. The coefficient of variation is sometimes 
useful as an aid in judging whether a series is too variable to have a 
meaningful average. When CV is greater than about 35 per cent, 
it may be doubted whether an average is an appropriate statistic. 
The distributions in which CV reaches this magnitude are usually 
J-shaped, rectangular, or markedly multimodal. The check does 
not work the other way, however, since such distributions may have 
low CV's if only because of high average value. 

At the other extreme, CV may be useful in judging whether the 
variability of a set of values is exceptionally small. When CV is 
less than about 5 per cent, the data are less variable than we 
ordinarily anticipate in educational and psychological research. 
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Small CV’s raise doubt regarding correctness of computation of 
means and standard deviations, and, in experimental situations, 
questions regarding possible overselection or overcontrol. The 
student will find that the great majority of statistical series in edu- 
cational work are characterized by СУ’з between about 5 and 35 
per cent. Although series in which CV falls outside this range can 
and do arise, they invite special analysis and interpretation. 

The coefficient of variation is particularly liable to misinterpreta- 
tion in educational measurement. Interpreted with care and in light 
of what is known about the measuring instrument, however, it 
may provide valuable information not otherwise available. Harris 
and others (Ref. 5) present several interesting applications of the 
coefficient, and Snedecor (Ref. 7) discusses its use in planning 
experiments. 


Exercises 


39. Compute the coefficients of variation of the IQ's whose means and 
standard deviations are given in Table 4.9. Interpret the coefficients. 
Why are they not needed in this case? 

40. The following data are taken from the 1943 Statistical Abstract of the 
United States, рр. 215, 291. In what way is the coefficient of variation 
better than the quartile or standard deviation in comparing the vari- 
ability of expenditures? 


Per Capita State Expendi- Per Capita State Expendi- 
tures for General Purposes tures for Public Schools in 
in 1940 1940 
DOLLARS f DOLLARS f 
85.00-89.99 1 27.00-28.99 1 
80.00-84.99 0 25.00-26.99 2 
75.00-79.99 1 23.00-24.99 3 
70.00-74.99 0 21.00-22.99 2 
65.00-69.99 2 19.00-20.99 6 
60.00-64.99 1 17.00-18.99 13 
55.00-59.99 3 15.00-16.99 4 
50.00-54.99 6 13.00-14.99 5 
45.00-49.99 10 11.00-12.99 4 
40.00-44.99 7 9.00-10.99 4 
35.00-39.99 5 7.00-8.99 4 
30.00-34.99 4 
25.00-29.99 7 
20.00-24.99 T: 


с 
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41. The coefficients of variation of teacher experience and salaries are 36 
per cent and 15 per cent, respectively, in one school district. Ina second 
district they are 32 per cent and 24 per cent. If mean experience and 
mean salary in the first district are approximately equal to those in 
the second district, in which district would you prefer to begin teaching? 
Why? 

42. An investigator gave a test of English usage and one of spelling to a 
group of students. The English test consisted of 60 items and the 
spelling test of 60 words. The means and standard deviations were 
38.5 and 6.0, respectively, on the English test and 30.0 and 10.0 on 
the spelling test. The investigator concluded that students are more 
variable in spelling than in English usage. Criticize the conclusion. 
Does the coefficient of variation have useful meaning under such con- 
ditions? Explain. 


Standard Scores 


If the mean and standard deviation of a series are known, it is 
possible to express the deviation of any score from the mean as a 
multiple of the standard deviation. When a score is expressed in this 
manner, it is commonly called a standard ог z score.* Symbolically, 


X—M 
diese er! (4.16) 
the mean always being sublracled from the score. Since the deviation 
X — M commonly is represented by z, definition (4.16) may be 
written z = z/s. 

To transform a set of raw scores to standard scores, we need only 
to find the mean and standard deviation of the set and divide the 
respective deviations of the scores from their mean by the standard 
deviation. In the A series of Table 3.1, for example, in which M = 
31.7 and о = 4.2, a raw score of 37 has the standard or z score 
equivalent of (37 — 31.7)/4.2 or about 1.3; and a raw score of 24 has 
the z score equivalent of (24 — 31.7)/4.2 or about — 1.8. The z score 
--1.3 means merely that its raw score equivalent is 1.30 above the 
mean, and the z score —1.8 means that its raw score equivalent is 
1.80 below the mean. In short, a standard score indicates how many 
standard deviations the corresponding raw score is from the mean 
of the series. 


* Owing to the practice of using the Greek letter т to represent S.D., standard 
scores are sometimes referred to as sigma scores. 


= 
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Interpretation and Use of Standard Scores. Standard scores 
have several advantages in statistical theory and practice. They are 
algebraic and hence are tractable in mathematical discussion. Since 
a standard score is derived by dividing a deviation from the mean 


TABLE 4.10 


PERCENTAGES OF SCORES FALLING BELOW SELECTED = 
SCORES IN A NORMAL DISTRIBUTION 


PERCENTAGE PERCENTAGE PERCENTAGE 
X—M OF SCORES Х — М OF SCORES OF SCORES 


EY Sloan BELOW Z eU. BELOW Z BELOW 2 
.0 1% 0 15.9% 84.1% 
19 .2 T 18.4 86.4 
.8 3 8 21.2 88.5 

SZN .3 vit, 24.2 90.3 
—2.6 .5 .6 27.4 91.9 
—2.5 .6 E 30.8 93.3 
4 .8 E 34.5 94.5 

3 1:3 .3 38.2 95.5 

.2 1.4 .2 42.1 96.4 
Т 1.8 aut 46.0 97.1 
—2.0 2.3 0 50.0 97.7 
55156! 2:9 + 54.0 98.2 
52158; 3.6 + .2 57.9 98.6 
— С 4.5 з 61.8 98.9 
E156 5.5 + .4 65.5 99.2 
—1.5 6.7 + .5 69.2 99.4 
—1.4 8.1 + .6 72.6 99.5 
—1.3 9f. жо 75.8 99.6 
=1.2 11.5 + :8 78.8 99.7 
pr 13.6 3259 81.6 99.8 
99.9 


by the standard deviation, both of which are in the same unit, it 
is an abstract quantity, i.e., a quantity independent of the original 
measurement unit. This independence is the significant charac- 
teristic of standard scores. It is of considerable practical impor- 
tance in educational testing, as will be demonstrated below. The 
mean and standard deviation of any series of standard scores are 
0 and 1, respectively, a fact which the student is asked to prove in 
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exr. 49, p. 169. As our study proceeds, we shall find that the use of 
standard scores simplifies many statistical procedures. 

Standard scores are widely used in testing. When a distribution of 
test scores is normal or approximately so, standard scores reveal a 
great deal of information.(A z score of 0 indicates а raw score at the 
mean; a positive z score, a raw score above the mean; and a nega- 
tive z score, a raw score below the mean. A z score of 3.00 is very 
exceptional, since it is Зо above the mean; and а 2 score of —3.00 
is very exceptional, since it is 3c below the mean. Less than .3 per 
cent of the scores in a normal distribution deviate from the mean 
by as much аз +30. In Table 4.10, the percentages of scores in a 
normal distribution falling below the indicated z scores are shown. 
Thus, z scores in a normal distribution can easily be transformed 
to percentile scores and vice versa. (See pp. 211-212.) The student 
should study Fig. 4.7, Table 4.10, and Table A, Appendix C, until 
he can reconcile the three. 

Since they express the performance of an individual with refer- 
ence both to the mean and to the variability of his group, stand- 
ard scores tend to be more informative than other scores. Consider 
the two fictitious series of test scores below. The means of the series 
are equal, but the first series has a standard deviation of 6.5 as 
compared to 10.9 for the second. 


SERIES 1 SERIES 2 
70 60 
71 62 
72 66 
73 71 
74 76 
75 76 
76 80 
Re 85 
78 90 
94 94 


The z score equivalent of 94 in the first series is (94 — 76) /6.5 or 
about 2.8. In the second series, however, 94 has a z score equivalent 
of about 1.7. A student having a score of 94 in the first series is 
more exceptional than a student having 94 in the second, in that 
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the former is more unlike his group than the latter. This sort of 
exceptionalness is clearly reflected by standard scores. 

Since they are independent of the unit of measurement, standard 
scores are comparable scores, provided the distributions are normal 
or approximately so. By their use, an individual’s performance in 
one test can be compared with his performance in a second, regard- 
less of differences in the measurement units. Since they are algebraic 
the standard scores of an individual in several tests can be combined 
into a composite score. Let us illustrate both of the preceding points. 
Suppose that the raw scores of students A and B on four tests and 
that test means and standard deviations, based upon the group of 
which A and B are members, are as shown in Table 4.11. It is 


TABLE 4.11 


RAW SCORES AND STANDARD SCORES OF STUDENTS A 
AND B IN 4 TESTS 


RAW SCORE DEVIATION SCORE | STANDARD SCORE 


obvious that the raw scores on the four tests are neither comparable 
nor combinable. Unless some transformation is made, we cannot 
compare A's (or B's) performance іп one test with his performance 
in another, nor сап we compare A's performance with B's perform- 
ance on the tests as a whole. It would be absurd to say that A’s per- 
formance in test 2 is better than his performance in the other tests 
or that B’s total performance on the four tests is better than A’s, 
since the raw scores are not comparable. 

The standard scores corresponding to A's and: B's raw scores are 
shown in the last column of Table 4.11. These scores are comparable 
and may logically be used in determining the test in which A's or 
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B’s performance is best or poorest and may be combined to form 
composite scores on the four tests. They indicate that A’s perform- 
ance is best in test 3 and poorest in test 2 and that since A’s mean 
z score is 1.5/4 or about .4, while B's is about —.1, A’s performance 
in the four tests as a whole is considerably better than B’s. 

Such comparisons as the above are demonstrably sound only if the 
distributations of raw scores are normal or approximately so. In 
the nonnormal distribution, the ratio of a deviation from the mean 
to the standard deviation, unlike the ratio in a normal distribution, 
does not have exact meaning. It is an observable fact, however, 
that the ratio has considerable stability of meaning except in 
markedly nonnormal distributions. The standard score transforma- 
tion may be applied to the majority of distributions encountered in 
educational testing without introducing serious error. 

The question frequently is raised as to whether the standard score 
method is better than the percentile method of deriving comparable 
scores from raw scores. There is no general answer to the question. 
In deciding which of the two to use in a given testing situation, their 
peculiar limitations and advantages have to be weighed in view of 
the purpose of the tests and the distributions of raw scores. 

The percentile method is easier to apply and percentile scores are 
more readily interpreted and more widely understood. The per- 
centile method is applicable to distributions of any shape. As we 
have seen, however, percentile units do not represent uniform raw 
score intervals from bottom to top of the scale. This disadvantage 
does not hold for standard scores. Standard scores can be averaged, 
although, since a composite score based upon several tests rarely 
provides as much information as the separate scores, this tends to 
be of little importance in practical testing work. Standard scores 
are more difficult to interpret than percentile scores, particularly 
in nonnormal distributions. It would seem that neither has unquali- 
fied superiority over the other. Percentile scores are generally ade- 
quate in practical work. In statistical theory, however, standard 
scores are by far the more useful. 
қр енелер Standard Scores to Positive Whole Numbers. 
The conversion of raw scores to standard scores results in decimal 
numbers, some of which are negative. In order to eliminate decimals 
and negative signs, standard scores frequently are multiplied by a 
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constant and added to another constant, a widely used scheme 
being one in which the standard scores are multiplied by 10 and 
added to 50. 

Standard scores which have been multiplied by 10 and added to 
50 are designated by a capital Z, so that 


Z = 10z + 50, (4.17) 


in which z is defined by (4.16). When raw scores are normally dis- 
tributed, their Z score equivalents are identical to the well-known 
McCall 7 scores, as will be shown in Chapter V. The mean and 
standard deviation of а set of Z scores are 50 and 10, respectively. 

Various other methods of expressing standard scores are used. 
Some of these do not eliminate decimals, but do eliminate negative 
signs by adding a constant, such as 3 or 5, to each z score. The 
VAT and MAT scores of Table II, Appendix B, are subsets of large 
sets of standard scores which have been multiplied by 100 and added 
to 500. This particular form of the standard score is used extensively 
by the Educational Testing Service of Princeton, New Jersey. 

In general, if each z score in a series is multiplied by a constant H 
and added to a second constant К, the mean of the new series is К 
and the standard deviation is H, as shown in Appendix A. 

Accuracy of Standard Scores. Since in the formula 2 = 2/, 
2 is the least accurate number, а standard score has as many sig- 
nificant digits as z. Application of this fact, however, would result 
in ragged decimals, i.e., some standard scores would be reported to 
ones, some to tenths, others possibly to hundredths, depending 
upon the number of digits in z and the magnitude of c. To avoid 
ragged decimals, therefore, we recommend the following practice: 
When raw scores are given to. one-figure or two-figure accuracy, 
report standard scores to tenths; when raw scores are given to three- 
figure accuracy, report standard scores to hundredths; and so on. 


Exercises 


43. The mean of the Arithmetic Fundamentals Test scores, School A, 
Table 1, Appendix B, is approximately 31.7, and the standard deviation 
4.2. Transform the 23 scores into standard scores. Could these standard 
scores be combined with the standard scores of the students on the 
other tests listed in the table? If some of the distributions of scores on 
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44. 


46. 


47. 


the other tests аге markedly nonnormal, will the composite standard 
score have clear meaning? Explain. 

In analyzing a distribution of test scores a teacher computed both 
percentile and standard scores. He found that а 2 score of 0 corresponded 
to a percentile score of 55. Was the distribution symmetrical? In what 
sort of distribution will a z score of 0 correspond to a percentile of 50? 


- The distribution of raw scores and their corresponding percentile scores 


of 67 dental school students on a hand-eye co-ordination test are shown 
below. The mean and standard deviation of the distribution are 21.4 
and 2.4, respectively. 


PERCENTILE 
SCORE F RANK 
26 1 99.3 
25 4 95.5 
24 5 88.8 
23 T 79.8 
22 10. 67.2 
21 10 52.2 
20 15 33.6 
19 5 18.7 
18 4 11.9 
17 3 6.7 
16 2 3.0 
15 0 1.5 
14 T ч 


a, Find the 2 and Z score equivalents of the raw scores. 

b. Show that the histograms or the frequency polygons of the raw and 
standard scores are similar in shape. 

c. What are the advantages of expressing the raw scores as percentile 
scores? 

d. What аге the advantages of expressing the raw scores as standard 
scores? 

e. What are the percentile score differences and the standard score 
differences which correspond to the raw score differences: 15 — 14, 
18 — 17, 21 — 20, 25 — 24) 


Is it possible for а 2 score of 1.5 to represent the highest score in a dis- 
tribution? Is it possible for а z score to have a value of 4.0 or more? 
Explain. 

What information can we deduce from a z score which we cannot 
deduce from a percentile score? 


is 
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48. What information can we deduce from a percentile score which we 
cannot deduce from a z score? 

49. Show that the mean of a set of 2 scores is 0 and the standard deviation 1. 

50. Show that the mean of a set of Z scores is 50 and the standard deviation 
10. 

51. The mean and standard deviation of a series of heights, measured in 
inches, are 64.0 in. and 2.0 in., respectively. What is the standard score 
in height of an individual whose height is 60 in.? What would be his 
standard score if the heights were measured in feet? 

59. The mean and standard deviations of the heights and weights of a 
group are 68.0 in. and 2.5 in. and 150.0 Ibs. and 10.0 Ibs., respectively. 
If an individual's height is 72 in. and his weight 180 Ibs., why is the 
individual “heavier than tall” with respect to the group? 

53. In the above distributions of weights and heights, an individual has a 
standard score in height of 1.5 and a standard score in weight of .5. 
What are his height in inches and his weight in pounds? 

54. A student has a standard score of —1.8 in a distribution of scores in 
which M equals 75.0 and с equals 7.5. What is his raw score? 


Interpretation and Use of Measures of Variability 


It has been said that the most popular use of an average is to con- 
ceal variability. Although the remark is facetious, an average does 
conceal variability, and hence, unless supplemented by other in- 
formation, presents an indefinite and perhaps distorted picture. As 
has been emphasized in the preceding pages, the extent and manner 
of the scattering of items about their average value should always 
be taken into account in describing and analyzing a series. 

In this section we shall summarize the more important properties 
and uses of measures of variability in analyzing statistical data. 

Properties of Measures of Variability. Measures of variability 
are merely statistics which summarize the amount or extent of 
variation. Each measure differs from the others because each sum- 
marizes variation in a different зуау. In contrast with the point 
nature of averages, measures of variability may best be inlerpreled as 
distances on the scale of scores. The interpercentile measures, аз de- 
rived, are such distances; the average and standard deviations may 
be thought of as such. In a normal distribution, the mean, median, 
and mode coincide, i.e., they have the same value. Although this 
is not true of the measures of variability, the latter do have a con- 
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stant relationship. In the normal distribution, it can be shown that 
the following relationships obtain: 


с = 1.253AD с = 1.4830 
AD = .198c AD = 1.1830 
0 = .615c 0 = .845AD 


Tt also is true in normal distributions that c, AD, and Q distances 
from the mean include the following percentages of items (ef. Fig. 
4.7): 


le below mean to le above mean includes about 68 per cent. 
2c below mean to 2c above mean includes about 95 per cent. 
Зо below mean to Зо above mean includes about 99.7 per cent. 


1AD below mean to 1AD above mean includes about 58 per cent. 
2AD below mean to 2AD above mean includes about 89 per cent. 
ЗАР below mean to ЗАР above mean includes about 98 per cent. 


10 below mean to 10 above mean includes 50 per cent. 

20 below mean to 20 above mean includes about 82 per cent. 
ЗО below mean to 3Q'above mean includes about 96 per cent. 
40 below mean to 40 above mean includes about 99.3 per cent. 


Although these facts hold exactly only for normal distributions, 
it is a matter of common experience that, for the great majority of 
distributions encountered in education work, they hold sufficiently 
to be generally useful in interpretation and analysis. If we have, for 
example, a large, approximately normal distribution of measures 
of some ability, with a mean of, say, 70 and a standard deviation of 
10, we know that the measures range from about 40 to 100, that 
about 95 per cent of them fall in the interval 50-90, and about 2/3 
in the interval 60—80. If we have a second distribution of measures 
of the same ability with a mean of 70 and a standard deviation of, 
say, 5, we know that the first distribution includes a great many 
more able and less able individuals. (If desired, Q or AD instead of 
с intervals can be employed in similar analysis.) 

The measures of skewness and kurtosis which we shall consider 
in the next section provide quantitative description of nonnormal 
variation and are helpful in determining whether a distribution 
departs too greatly from normality to be analyzed by standard 
methods. 
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Reliability of the Quartile, Decile, Average, and Standard 
Deviations. The question of which measure of variability is the 
most reliable means, essentially, which measure fluctuates least from 
sample to sample drawn from the same population. It can be demon- 
strated logically that, in sampling from a normal population, the 
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Fig. 4.8. Sampling fluctuations of Q, D, AD, and с. Heavy lines indi- 
cate population values. (From Table 4.12.) 


standard deviation has the greatest reliability. The demonstration 
is beyond the scope of this book, but we can examine the matter 
empirically. In Table 3.6, p. 98, we have the distributions of 10 
samples drawn at random from a normal population. The quartile, 
average, and standard deviations and the Ps — Pio range or D 
of the 10 samples are recorded in Table 4.12, and are shown graph- 
ically in Fig. 4.8. The table and illustration are instructive in several 
ways. First, although the profiles of the four measures tend to show 
the same ups and downs, there are several exceptions. Since we 
know the population values in this case (something we do not know, 
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of course, in real sampling problems) we can examine the fluctuations 
of the measures about their true values. As indicated by the profiles 
and the percentage of error, Q provides the best estimate of popula- 
tion variability only in sample 3. The mean percentage of error of 
the Q estimates is 12.9 per cent. The D and AD estimates are about 
equally good on the average, the mean percentage of error of the 
former being 6.8 per cent, that of the latter 7.1 per cent. The o’s 
provide the best estimates in 5 of the 10 samples and have a mean 
percentage of error of 4.5 per cent, substantially less than the 
others. 

Ordinarily we do not know either the form of the population 
distribution or the true variation, and hence have no way of ex- 
amining the consequences of using a particular measure as an esti- 
mate of the population variation. The most that can be said in this 
connection is that if there is reason to believe that the population 
distribution is normal or approximately so, the standard deviation 
of the sample is preferred as the estimate of population variation 
on the basis that the odds favor it to provide the best estimate. 

Appropriate Applications. In deciding which measure of vari- 
ability to use in a given situation, several considerations need to be 
taken into account, although it may not be possible to meet all of 
them. 

In reporting to people untrained in statistics, the range or average 
deviation will be best understood. If a series is such that the median 
is the appropriate average, О or some other percentile measure 
ordinarily should be used. Likewise, the mean and standard devia- 
tion are usually used together. In many situations it is advisable to 
use two measures of variability, one for simple descriptive purposes, 
the other as a basis for more exact and extended analysis. 

Аз in the case of averages, the best aid in selecting a measure of 
variability is familiarity with the advantages and limitations of the 
various measures, recognition of the purposes to be accomplished, 
and precaution against presenting a misleading picture. 

In summary statement: 


The over-all range is too much affected by the chance position of the 
highest and lowest values in a series, and tells too little about the varia- 
tion of intermediate items to be useful except for rough purposes. It is 
chiefly useful as a supplementary measure. 
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The interpercentile range measures, such as Q and D, team with the 
median. They are not affected by extreme values and are usually ap- 
plicable to open-end distributions and to distributions having unequal 
class intervals. Ordinarily, the points Q; and Q, or Ps and Ру should be 
reported as well as Q or D. 

The average deviation is the most direct and easily interpreted repre- 
sentative measure. It is affected by all deviations, but is not as sensitive 
to extreme deviations as the standard deviation. It tends to be almost 
as reliable as the standard deviation, in the sense of being character- 
ized by little fluctuation from sample to sample drawn from the same 
population. 

The standard deviation is the most used and useful of the measures. 
Tt is generally the most reliable, it enters into further statistical analysis 
at various points, and is tractable in mathematical discussion. In the 
latter respect, it is unique among measures of variability. It teams with 
the mean. As a rule it should be used unless there is good reason for not 
using it. 

The coefficient of variation or relative standard deviation is appro- 
priate in comparing series haying unequal means and series expressed 
in dissimilar units. It can be a capricious measure in the hands of the 
unwary and should be used and interpreted with caution. 


Accuracy. Thus far nothing has been said about the number of 
digits to retain in measures of variability. The percentile measures, 
such as Q and D, being based upon counting, theoretically are exact; 
like the median, however, they are not exact in a practical sense. 
The average and standard deviations, since they are derived from 
differences between scores and mean score, are approximate, How 
many digits to retain in reporting these is a persistent and trouble- 
some question. 

The definition AD = Z|z|/N indicates that the average deviation 
will have as many significant digits as Х|г|. Since this is always 
explicitly determined, it is a simple matter to determine the number 
of digits to retain in an average deviation. For example, in the 
illustrative problem, p. 139, the absolute sum of the deviations 
has three significant digits; hence, the average deviation is reported 
to three-figure accuracy. 

The definitione = 4//Zz?/N indicates that the standard deviation 
will have as many significant digits as 22°. In practice, however, we 
usually use formula (4.8) ог, group the data and use formula (4.9), 
and consequently do not explicitly calculate Zz?. Let us examine the 
values we substitute in formula (4.8) in computing the standard 
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deviation of the IQ's entered in Table 4.4. For this series, DX? is 
159,499 and ХХ is 1,685, so that МУЛ? is 2,810,982 and (Z X)? is 
9,839,225, the bars indicating the first doubtful digits in the prod- 
ucts. When we make the subtraction called for in the formula, we 
obtain 31,757. Since a root generally has as many significant digits 
as the number, and since 1/N is exact, the computed standard 
deviation 9.9003 . . . should be rounded to 9.9. 

'The standard deviation is a complex function of a statistical 
series, and. no rule-of-thumb can be depended upon. It generally 
contains one or two fewer significant digits than the mean. However, 
there are advantages in reporting the standard deviation with the 
same precision as the mean. For this reason, and in the interests of 
simplicity and uniformity the following arbitrary practice is 
suggested: 

Report all measures of variability to the same degree of precision as 
the mean. Thus, if a mean is reported to tenths, measures of variability 
will be reported to tenths; if the mean is reported to hundredths, meas- 
ures of variability will be reported to hundredths, and so on. 


The practice tends to exaggerate the accuracy of measures of vari- 
ability, but it is easy to follow and ordinarily gives reasonably good 
results. 

Quick Approximations of Measures of Variability. The 
various interrelationships between percentiles and measures of 
variability suggest methods of approximating the latter quickly 
in a distribution which is sensibly normal. Such methods may be 
useful in (1) detecting gross error in computed measures of vari- 
ability, (2) roughly comparing the variability of several distribu- 
tions in a minimum of time, (3) approximating measures of vari- 
ability in a long ungrouped series in which the items are listed in 
order of size, and (4) estimating population variability from a 
sample. 

Among the simpler methods of approximating о are: 


Method 1. Multiplying the over-all range by 18. 

Method 2. Multiplying the interpercentile range Р» — Р; by .34. 

Method 3. Multiplying the expression (Por + Pss — Pis — Ра) by 17. 

Method 4. Multiplying the expression (Pss + Ри Py — Р» — Ps — Рз) 
by .12. 
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Methods 3 and 4 are particularly appropriate for approximating с 
of a distribution of test scores which are reported in percentile 
norms. After с has been approximated, other measures of variability 
may be obtained through the relationships given on p. 170. 

To apply a method we need only to determine the range or the 
percentiles called for and divide or multiply as indicated above. 
The student can verify that, for the distribution of Table 4.7, in 
which computed о is 8.84, approximation of с by method (1) gives 
9.2; by method (2), 9.0; by method (3), 8.7; and by method (4), 
9.0. Шів usually the case that method (4) gives the best results, 
method (3) next best, and method (1) the poorest. 

During recent years considerable study of the efficiency of ap- 
proximation methods has been made, with special reference to the 
ungrouped series, each item of which is recorded on an IBM punch 
card. The methods frequently are useful in estimating population 
variability from a sample. (See Ref. 3, Chap. 15.) 


Exercises 


55. Determine the approximate ranges of the distributions of Table 3.6 
and contrast their fluctuations with those of Q, AD апа ø as given іп 
Table 4.12. 

56. The ranges, Q's, AD's and o’s of the IQ distributions in Table 4.1 as 
computed in previous pages are: 


DISTRIBUTION RANGE Q AD c 
B 33 9.28 9.04 10.02 
с 57 8.64 11.40 14.13 
а val 11.88 12.49 15.96 
J 40 5.70 7.50 9.70 


With reference to the distributions and the properties of the measures 
account for the contradictory evidence regarding variability. 

57. If you were called upon to discuss with a PTA the extent of individual 
differences in, say, reading comprehension, which measure or measures 
of variability would you use? 

58. Given a distribution of scores of 60 students on an algebra achievement 
test, state a specific purpose for which (a) the range would be appro- 
priate; (b) the quartile deviation; (c). the average deviation; (d) the 
standard deviation; (e) the coefficient of variation. 


У гы 
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59. Describe а set of data for which (a) the quartile deviation would 
be most appropriate; (b) the average deviation; (c) the standard 
deviation. 

60. Try to find or construct one or more frequency distributions in which 
the interval + lø from the mean includes less than, say, 60 per cent of 
the items or more than 76 per cent. Sketch the histogram or polygon of 
the distribution(s). 

61. In a large, approximately normal distribution, Ps; is 88.5 and P; is 
44.5. What are the approximate values of о, Q, and AD? 

62. The standard deviation of a large, approximately normal distribution, 
as computed by a student, was .80. The over-all range of the distribu- 
tion was 40. Comment. 

63. Among the percentile norms reported by a test constructor were Por = 
65, Pss = 58, P3, = 32, P, = 26. The standard deviation reported was 
15.0. Comment. 

64. How would you approximate the standard deviation of a distribution 
whose percentile curve is given? Illustrate by reference to Fig. 4.2, 
p. 128. 


Skewness and Kurtosis 


Тп an earlier chapter, pp. 70-73, the meaning of skewness and 
kurtosis was touched upon. We are now ready to examine the con- 
cepts more fully. Skewness and kurtosis depend upon the manner 
in which the scores in a series scatter about the average value. When 
the scatter is greater on one side of the point of central tendency 
than on the other, the distribution is skewed. When there is high 
concentration of the scores in the neighborhood of the point of 
central tendency, the distribution is relatively narrow across the 
shoulders, or leptokurtic; when there is low concentration of scores 
in the neighborhood of the point, the distribution is relatively broad 
across the shoulders, or plalykurlic. 

There are several rough methods of judging whether a distribu- 
tion lacks normal symmetry or peakedness. Skewness may readily 
be detected by inspection of the frequency polygon. It ordinarily 
is impossible to detect departure from normal peakedness in this 
way, however, since apparent peakedness may result from choice 
of dimensions for the polygon. When quartile or standard deviation 
intervals do not include “normal” percentages of cases, the dis- 
tribution is either skewed or abnormally peaked. Such methods, 
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although sometimes useful in preliminary analysis, are rough and 
do not permit conclusive comparisons. In order to describe and com- 
pare distributions exactly, we must, of course, have measures of 
skewness and kurtosis as well as measures of central tendency and 
variability. 

Nonalgebraic Measures of Skewness. As we have seen, there 
are various useful measures of central tendency and of variability. 
There also are various measures of skewness and kurtosis. The sim- 
pler of these are based upon relationships between averages, per- 
centiles, and measures of variability. 

Since it is affected by the magnitude of each score in a series, the 
arithmetic mean will not coincide with the median unless the scores 
are distributed symmetrically. In a skewed distribution, the mean 
is pulled more than the median toward the skewed side. The greater 
the difference between the mean and the median, the greater the 
skewness. 

The frequency polygon of a skewed distribution is shown in Fig. 
4.9. The mean of the distribution is 550.37 and the median 537.62. 
It will be noted that the mean here, as always, is pulled more than 
the median toward the skewed side. 

Although any difference between the mean and median of a dis- 
tribution indicates skewness, the difference cannot be used in com- 
paring two distributions because it is expressed in the unit of the 
original measures. For example, the skewness in a distribution of 
heights measured to centimeters would be about 214 times the 
skewness of the same heights measured to inches, if the difference 
between mean and median were employed as the criterion. 

One of the widely used measures of skewness is that obtained by 
dividing the quantity 3(М — Мап) by с. This measure is zero when 
the distribution is symmetrical, negative when it is skewed to the 
left, and positive when it is skewed to the right. (Why?) Since it is 
a pure or abstract number (independent of the unit of measurement) 
it may be used to compare the skewness of two or more distributions, 

Another simple measure, expressed entirely in terms of per- 
centiles, is given by the formula 


^d Pio + Poo — 2Р 


Sk, HE (4.18) 


and is commonly called the percentile measure of skewness. The 


—— 


Ба 
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` values of Sk, vary from —1 to +1. It is left as an exercise for the 


student to show that Sk, is zero for symmetry, negative for left- 
hand skewness, and positive for right-hand skewness. Like the 
measure of the preceding paragraph, Skp is abstract and may be 
used to compare the skewness of two or more distributions. 


24r 


Frequency 
[s] 
T 


344.5 404.5 464.5 524.5 584.5 644.5 704.5 764.5 
374.5 434.5 494.5 554.5 614.5 674.5 734.5 794.5 


Score 
Fig. 4.9. Positions of mean and median in the skewed distribution, 
(From Table 2.3, p. 41.) 


Formula (4.18) is easily applied. For the distribution of VAT 
scores shown in Fig. 4.9, the quantities to be substituted are Р = 
440.07, Pso = 537.62, Poo = 681.79, so that 

440.07 + 681.79 — 2(537.62) | 
Sk 681.79 — 110.07 nit 


Nonalgebraic Measures of Kurtosis. When we inspect lepto- 
kurtic and platykurtic distributions like those of Fig. 2.14 and 2.15, 
we note that the distinguishing feature is the manner in which the 
values spread over the range. In general, the greater the proportion 
of values in the vicinity of the average, the more peaked the distri- 
bution. This fact suggests that kurtosis can be measured by com- 
paring the spread of some specified middle percentage of the values 
with the spread of a larger percentage. 
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Among the various simple measures of kurtosis which have been 
proposed at one time or another, the most widely used and вегуісе- 
able is based upon the ratio of the range of the middle 50 per cent 
of the values to the range of the middle 80 per cent. This percentile 
measure is writteh 

7 Рт — Pos Q 
Ки, = БР ЕР ) D (4.19) 

The degree of kurtosis of a normally peaked or mesokurtic dis- 
tribution, as given by (4.19), is .263, a fact which the student is 
asked to verify in a later chapter. A distribution for which Ku, is 
less than .263 is leptokurtic, and one for which Ku, is greater than 
.263 is platykurtic. It will be noted that Kup, like the percentile 
measure of skewness, is independent of the unit of measurement. 

Let us apply the formula to the distribution of VAT scores of 
Fig. 4.9. The values for substitution аге Pio = 440.07, P»; = 485.12, 
Р = 616.38, Ро = 681.79, so that 


616.38 — 485.12 


Ки, = 2(681.79 — 440.07) — 


271. 

Hence, the distribution is slightly platykurtic as well as positively 
skewed. As a matter of fact, most observed distributions show both 
skewness and nonnormal peakedness. 

It should be remarked that the nonalgebraic measures of skew- 
ness and kurtosis are chiefly useful as descriptive measures. The 
extent to which they may vary, owing to chance, in sampling from 
a normal population can only be roughly determined. Hence, they 
are not generally satisfactory criteria of population normality. 

Measures of Skewness and Kurtosis Based upon Moments. 
The simple measures of skewness and kurtosis, described above, 
although sensitive to the lack of symmetry and lack of normal 
peakedness, are not as reliable or as generally useful as the algebraic 
measures based upon moments. 

In mechanics the term moment is used to denote a measure of the 
tendency of a force to cause rotation of an object about a point. 
Since the strength of the tendency depends upon the amount of the 
force and the distance from the point at which the force acts, a mo- 
ment is the product of force times distance. When the sum of the 
moments tending to cause rotation in one direction is equal to the 


| 
| 
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sum of the moments tending to cause rotation in the opposite direc- 
tion, the object is in balance. 

Now an item of a statistical series may be thought of as a unit 
force acting at a distance z from the arithmetic mean, ie., as a 
moment of force. Since the sum of negative deviations from the 
mean is equal to the sum of positive deviations, the mean is anal- 
ogous to a point of balance. In statistics the algebraic sum of the 
distances or deviations from the mean divided by №, =2/N, is 
called the first moment of the series. 

When the deviations 2 of the items are squared, summed, and 
divided by N, the quotient is called the second moment of the series. 
The third and fourth moments are based upon the third and fourth 
powers, respectively, of the deviations. These moments about the 
arithmetic mean conyentionally are designated by the Greek letter 
и (mu) and appropriate subscript. In this notation the first four 
moments of a series are 


ш = De/N=0; ш = Ert/N = 02; ps = Da*/N; шщ = Zz'/N. 


It will be noted that и» is equal to the variance of the series. 

The computation of the first four moments about the mean of 
the Arithmetic Fundamentals scores, School B, Appendix B, is 
shown in Table 4.13. 

The moments of a statistical series are important because they 
provide precise and sensitive measures of the degree to which the 
series departs from normal form. (In advanced statistics, the mo- 
ments are used to distinguish various types of frequency distribu- 
tions.) The measure of skewness based upon moments, commonly 
designated by the symbol аз* (alpha three), is given by the formula 


ES . (4.20) 


Qo m „ 
ауа 


and the measure of kurtosis o4* (alpha four) by the formula 


a = № (4.21) 


Tm 


* The symbol A//8; is sometimes used instead of o; to identify this measure 
of skewness and the symbol 8: instead of o; for kurtosis. (Despite the modern 
trend toward using Greek letters to denote population characteristics, the 
moment measures in the sample are still generally denoted by Greek letters.) 
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TABLE 4.13 
COMPUTATION OF THE FIRST FOUR MOMENTS ABOUT 
THE MEAN 
ee ee a oe 
SCORE £ Tt Is pd 
30 2.5 6.25 39.0625 
16 -11.5 132.25 —1,520. 17,490. 0625 
28 .5 .25 . 0625 
27 - 2.5 .25 = ‚0625 
28 а” ‚25 ‚0625 
32 4.5 20.25 410.0625 
34 6.5 42.25 1,785.0625 
29 1.5 2.25 5.0625 
28 25 ‚25 . 0625 
14 —13.5 182.25 —2,460.: 33,215.0625 
28 .5 .25 ‚0625 
21 - 5 .25 - .0625 
23 — 4.5 20.25 = 410.0625 
32 4.5 20.25 410.0625 
32 4.5 20.25 410.0625 
31 3.5 12.25 ) 150.0625 
33 5.5 30.25 166.375 915.0625 
23 — 4.5 20.25 — 91.125 410.0625 
зим 495 0 510.50 —3,387.000 55,650.1250 
55.650.125 
M = 21.5 ГМЕ N pais 51050 ТЕТЕ E Ed 
=0 = 28.36 = = 3,092 


В Е 


When we substitute the appropriate values from Table 4.13 in 
formulas (4.20) and (4.21) we have 


222-1882 
= — 28.36 4/28:36 
= —1.25; 
_ 3,092 
94 = (28.36): 


= 3.84. 


Tn a normal distribution аз = 0 and ох = 3. A negative value of 
оз indicates left-hand skewness; а positive value, right-hand skew- 
ness, as in the various simple measures of skewness; and the greater 


—————————————— 


есе Ө a =”. 
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the departure from zero the greater the left or right skewness, as 
the case may be. When o; exceeds 3, the distribution is leptokurtic. 
When o; is less than 3, the distribution is platykurtic. The greater 
the difference between az and 3, the more pronounced is the platy- 
kurtosis or the leptokurtosis, as the case may be. Thus, the series 
of Table 4.13, in which оз = —1.25 and as = 3.84, is skewed to 
the left and is leptokurtic. 

It will be noted that the alpha measures of skewness and kurtosis. 
are abstract numbers, and hence may be used in comparing the 
skewness and kurtosis of two or more distributions. 

Calculation of Moments for Grouped Data. Measures of 
skewness and kurtosis tend to be unreliable in small samples, and 
ordinarily the measures are worth computing only when a statistical 
series is long enough to warrant grouping. 

When data are grouped in class intervals of constant size i and 
coded in the d unit, the formulas for the second, third, and fourth 


moments about the mean are: 


ey Ес = 3) nes (4.22) 
"s зе d (57) (34) ts (3%) | i (4.227) 


ВО - G0] 


'The application of the formulas, although entailing rather laborious. 
computations, is not difficult. The values for substitution in the 
formulas are ordinarily quite easily obtained, as illustrated. in 
Table 4.14. Except for the columns headed fd’ and fd‘, the computa- 
tional layout in Table 4.14 is exactly like that used in finding the: 
standard deviation of grouped and coded data. The entries in . 
column fd? are obtained by multiplying the entries in the fd? 
column by their respective d values, with attention to sign, and the 
entries in the fd* column are obtained by multiplying by 4 again. 
The various sums are divided by N and substituted in formulas 
(4.22), as shown in the space below the table. The student is cau- 
tioned to exert care in dealing with the signs of the substituted. 
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TABLE 4.14 


COMPUTATION OF MOMENTS FOR GROUPED DATA 
(Data from table 2.3, p. 41) 


SCORE f d fd Ја? Ја? Ја? 
780-809 1 8 8 64 512 4,096 
750-779 2 i 14 98 686 4,802 
120—749 3 6 18 108 648 3,988 
690—719 6 5 30 150 750 3,150 
660-689 7 4 28 112 448 1,792 
630—659 12 3 36 108 324 972 
600—629 8 2 16 32 64 12 
570-599 15 n 15 15 15 15 
540-569 14 0 d 
510-539 16 =!) —16 16 — 16 16 
480-509 24 —2 —48 96 —192 384 
450-479 14 —3 —42 126 —378 1,134 
420—449 7 —4 —28 112 —448 1,792 
390—419 6 —5 —30 150 —750 3,750 
360-389 1 —6 - 6 36 —216 1,296 
330—359 2 —7 —14 98 —686 4,802 

SUM 138 =19 1,321 761 32,617 


N 38 
хуй: _ 161 _ хуа 
SN, ЗЫЛ? NUS 


Substituting in formulas (4.22): 
us = [9.57 — (—.14)2]2 = 9.55 
из = [5.51 — 3(9.57)(—.14) + 2(—14) в = 9.538 
ш = [236.36 — 4(5.51)(—.14) + 6(9.57)(—.14)? — 3(—.14)4]i4 


values. Most errors in computing moments arise from carelessness 
with signs. 

There is rarely any need to substitute the value of the class in- 
terval i and express the moments in the original unit of measure- 
ment. When the moments are substituted in formulas (4.20) and 
(4.21), the Гз cancel out, as shown below. By formula (4.20) the 
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skewness of the distribution of Table 4.14 is 

ay he e aE 
9.5517 9.5512 9.55 4//9.55 

and by formula (4.21) the kurtosis is 


_ 240.51 : 240.57 
4 = (9.5512)? (0:55) 


аз 927 


= 2.64. 


It will be noted that, although the alpha measures of skewness 
and kurtosis indicate right-hand skewness and platykurtosis in the 
distribution under consideration, as did formulas (4.18) and (4.19), 
they do not give the same results numerically. It is always necessary 
in describing skewness and kurtosis to refer to the method by which 
they are measured. 

Accuracy of Measures of Skewness and Kurtosis. The ques- 
tion of the degree of accuracy, in view of the approximate nature of 
data, of the various measures of skewness and kurtosis, is complex, 
and no generally correct answer can be given. In the interests of 
convenience and uniformity, however, it is suggested that all 
measures of skewness and kurtosis, like measures of variability, be 
reported with the same precision as the mean. (See p. 175.) 

Uses of Measures of Skewness and Kurtosis. Although a full 
appreciation of measures of skewness and kurtosis depends upon 
knowledge regarding theoretical frequency curves, a special and 
highly important case of which is the normal curve, some idea about 
their use is not difficult to grasp. These measures indicate the extent 
of “nonnormal” variation in a series. The alpha statistics, like the 
mean and standard deviation, are members of the moments system. 
The four quantities, M, c, o, a4, are sometimes called the descriptive 
constants of the frequency distribution. They are the “ comparatively 
few numerical values" to which Professor Fisher refers in the para- 
graph quoted on pp. 5-6. The four constants (or the four percentile 
measures, Мап, Q or D, Skp, and Kup) and the number of cases N 
convey all of the information ordinarily needed to understand and 
interpret unimodal distributions. 

Although measures of skewness and kurtosis are not serviceable 
in the many ways that measures of central tendency and variability 
are, they are more fundamental than the latter, in the sense that 
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they quantitatively indicate departure from normality. The great 
majority of techniques applied to sampling problems presuppose 
normality at some point in their application, Moreover, even the 
simple descriptive measures of central tendency and variability, as 
we have seen, tend to lose meaning as a distribution departs from 
normality. The type of distribution is always of first concern in 
statistical analysis. 

It has been argued that skewness and kurtosis ordinarily have 
little meaning in educational measurement, since normality or lack 
of normality of a distribution of test scores is consequent merely to 
the number and difficulty of the items in the test. For example, if 
easy items are added to a test which has been giving a sensibly 
normal distribution of scores for a certain group, the test will then 
usually give a distribution skewed to the left and somewhat lepto- 
kurtic. This isnot a very impressive argument, and we shall consider it 
at some length in the next chapter. At this point, we shall only note 
that measures of central tendency and variability, as well as meas- 
ures of skewness and kurtosis, are affected by changes in a test. 
We make analyses on the basis of the mean, standard deviation, and 
degree of normality of the scores on the test as constructed, not on 
the test which might have been. 

Measures of skewness and kurtosis make it possible to describe 
departure from normality exactly. They give unambiguous mean- 
ing to such conyersational conveniences as “approximately normal,” 
“markedly nonnormal," and “severely skewed.” Let us look at the 
alpha values of the distributions shown in Fig. 4.10. The alphas 
indicate that the distribution of sample 5 is approximately meso- 
kurtic; that of sample 8, symmetrical. The greatest departure from 
normal peakedness occurs in sample 8, and the greatest departure 
from symmetry in sample 5. The alphas are always interpreted in 
this way. The more o; differs from 0, the more skewed the distribu- 
tion; the more ад differs from 3, the greater the departure from 
normal peakedness. Careful examination of the frequency polygons 
of Fig. 4.10 will help the student in understanding and interpreting 
the alpha measures. 

In educational measurement, skewness has generally been of 
more concern than kurtosis, both because skewness is easily de- 
tected and because it makes central tendency difficult to interpret. 


| 
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Although nonnormal peakedness does not directly affect central 
tendency, it does distort normal standard score and area relation- 
ships. In this respect nonnormal peakedness tends to be more con- 
founding than skewness. 

Measures of skewness and kurtosis permit comparisons of the 
shape of two or more given distributions. As descriptive measures 


Sample 5 


013 =+ 0.56 
о4= 3.09 


Sample 8 


a3 =+ 0.09 
047227 


Fig. 4.10. Polygons with varying degrees of skewness and kurtosis. 
(Distributions 3, 5, 6, and 8, Table 3.6, p. 98.) 


they supplement the measures of central tendency and variability, 
in that they provide information regarding the manner in which 
the items in a series are scattered about the average value. Аз was 
noted above, if a distribution is sensibly unimodal it may be compre- 


hensively described in terms of the four measures, M, c, оз, and as. 


Measures of skewness and kurtosis also are useful in making in- 
ferences from sample data regarding the form of population distribu- 
tion. This is an application we shall take up in a later chapter. 


Exercises 


65. Find the percentile measures of skewness Sk, and kurtosis Ku, of the 
distributions whose frequency polygons are shown in Fig. 4.10. Do 
these agree in meaning with the alpha measures? 
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66. Compute o; and a, of the MAT scores of exr. 20, p. 62. 

67. In a city school system, the distribution of chronological ages of sixth- 
grade pupils is unimodal with an оз of .55; in a second city, a similar 
distribution has an аҙ of .15. What do these facts suggest about the 
promotion policies in the two school systems? 

68. The distribution of scores for one large group is unimodal with M — 
51.50, - 8.45, аҙ = .36, and o, = 3.48. In a similar distribution for 


a second large group, М = 46.40, с = 6.25, a3 = —.16 and а, = 3.02. 
Sketch frequency polygons and compare the groups in as many ways 
as you can. 
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Chapter V 
The Normal Curve 


AT VARIOUS places in the preceding chapters, reference has been 
made to the normal distribution as a type of frequency distribution 
which many sets of observational data tend to approximate. The 
student will find it helpful to review pp. 68-73 at this time. 

The normality of data is a concept of great usefulness in statistical 
theory and practice, and no student can use and interpret statistics 
successfully without some understanding of the normal curve. It is 
no exaggeration to say that the 
“normal law," namely, the greater 
a deviation from the mean or ex- 
pected value in a series the less 
frequenlly il occurs, is the very 
foundation of statistical theory. 


The Normal Curve as a 
Limiting Form 


The student will recall from 
plane geometry that the circle is 
the limiting form of the regular 
polygon, i.e., as the number of 
sides of either the inscribed or 1 қ M. 
circumscribed polygon IT an Fig. 5.1. The circle as the limit- 

ing form of the inscribed and 
creased, the perimeter and area circumscribed polygons. 
of the polygon approach those 
of the circle as limits. It is easy to visualize the results of increas- 
ing the number of sides of either the inscribed or the circumscribed 
polygon shown in Fig. 5.1. Either of the polygons can be made to 


approach the circle as closely as is desired by increasing its number 
189 
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of sides. For this reason the circle may be thought of as a regular 
polygon having an infinite number of sides. 

The normal curve may be thought of as the limiting form of the 
frequency polygon of normally distributed data. When we group 


TABLE 5.1 


400 NORMALLY DISTRIBUTED SCORES GROUPED BY INTERVALS 
OF 9, 7, 5, AND 3 


SCORE Ј SCORE Ej SCORE F: SCORE y 
69—71 1 
66-68 1 
63-65 3 
60—62 5 
68—72 1 57-59 10 
63—67 4 54-56 15 
65-71 3 58-62 11 51—53 24 
63-71 5 58-64 13 53-51 26 48-50 32 
54-62 30 51-57 43 48-52 49 45-47 40 
45-53 96 44-50 86 43—47 69 42—44 45 
36-44 138 37-43 110 38-42 80 39-41 48 
21-35 96 30-36 86 33-37 69 36-38 45 
18-26 30 23-29 43 28-32 49 33—35 40 
9-17 5 16-22 13 23-27 26 30-32 32 
9-15 3 18—22 11 21-29 24 
13—17 4 24—26 15 
8-12 1 21-23 10 
18—20 5 
15-17 3 
12-14 T 
9-11 1 
TOTAL 400 100 400 400 


such data in smaller and smaller classes, their frequency polygon 
resembles more and more the normal curve. Consider the 400 nor- 
mally distributed scores of Table Ш, Appendix B. The fictitious 
nature of those data need not concern us here; later we shall describe 
several situations which give rise to theoretically normal data. In 
'Table 5.1 the 400 scores are shown grouped in class intervals of 9, 
7, 5, and 3. The frequency polygons of the four distributions are 
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plotted in Fig. 5.2. When we inspect them we note that the smaller 
the grouping interval or, what amounts to the same thing, the 
greater the number of sides of the frequency polygon, the more 
nearly the polygon resembles the normal curve. If we had a very 


УЗ 
ye 


Fig. 5.2. Frequency polygons of normally distributed data grouped 
by intervals of 9,7,5, and 3. (From Table 5.1.) 


large number of continuous normal scores, we could make the 
grouping interval as small as we please and still have frequencies 
for each interval. By making the grouping interval smaller and 
smaller, we might approximate the smooth normal curve to any 
desired degree of exactness. For this reason, the normal curve may 
be thought of as the limiting form of the frequency polygon of 
normally distributed data. It follows of course that the relation- 
ships between class frequencies and heights and between total fre- 
quency and area in the frequency polygon will hold in the normal 
curve, 
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Exercises 


1. The 400 scores of Table III, Appendix B, grouped by intervals of 1 are 
shown below. Construct a frequency polygon of the distribution and 
compare with the polygons of Fig. 5.2. 


SCORE f SCORE f SCORE f SCORE f SCORE f 
11 1 23 1 35 14 47 13 59 3 
12 0 24 4 36 14 48 12 60 2 
13 1, 25 5 37 15 49 11 61 2 
14 0 26 6 38 16 50 9 62 1 
15 it 27 7 39 16 51 9 63 1 
16 T 28 8 40 16 52 8 64 1 
I7 1 29 9 11 16 53 7 65 1 
TOI 30 9 42 16 54 6 66 0 
19 2 31 11 43 15 55 5 61 1 
20 2 32 12 44 14 56 4 68 0 
21 3 33 13 45 14 57 4 69 1 
22 Б] 34 13 46 13 58 Б] 


2. The successive terms of the binomial (1/2 + 1/2)" give the theoretical 
relative frequencies of n, n — 1, n — 2, . . . , 2, 1, 0 heads in tossing 
n coins. For example, if 4 coins are tossed, the relative frequencies 
of 4, 3, 2, 1, 0 heads are 1/16, 4/16, 6/16, 4/16, 1/16, respectively. If 
4 coins were tossed 16 times, theoretically 4 heads and 0 tails would 
occur once; 3 heads and 1 tail, four times; 2 heads and 2 tails, six times; 
1 head and 4 tails, four times; and 0 heads and 4 tails, once. What would 
it mean to say that the normal curve is the limiting form of the histogram. 
of the binomial distribution? (See Ref. 9, pp. 177-179, for an alge- 
braic proof that the normal curve is the limiting form of the binomial 
distribution.) 


The Equation of the Normal Curve 


There are several advantages to be gained when a set of observa- 
tional data can be represented by a mathematical equation. Mathe- 
matical equations in an empirical science, in general, permit parsi- 
monious description of data, provide the essential link between 
theory and observation, and make it possible to estimate the values 
of one or more variables from the values of related variables. 

'The equation of a frequency distribution, when it can be deter- 
mined to an acceptable approximation, idealizes the distribution 
and, it may be presumed, gives a better idea of the distributions of 
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further samples and the distribution in the population. When it 
can be said that a sample distribution is approximated by some 
ideal form whose equation is known, we may assume that the popu- 
lation distribution is characterized by the general and invariant 
properties of that form and we can make analyses and inferences 
about the population which otherwise would not be possible. 

Although the normal distribution is by no means the only type 
of frequency distribution whose equation is known, it is both the 
most common and the most important in statistical theory. The 
two variables in the equation of the normal curve are frequencies 
and values, ї.е., the equation gives the frequencies with which the 
different values or scores in a normal distribution occur. Since the 
class frequencies in a distribution correspond to the heights of the 
vertices of the polygon, we may work with height at this point 
without loss of meaning. 

The conventional equation of the normal curve is 


. Ем 
emit е, (5.1) 


in which у is the height or ordinate of the curve at X, X being any 
point on the given scale of scores of which M* is the mean, № is 
the total frequency or number of scores in the distribution, c* is 
the standard deviation of the distribution, and т and е are well- 
known mathematical constants. 

The Equation of the Standard Normal Curve. If we express 
the frequencies in the classes of any distribution as proportions, the 
total frequency will be the sum of the proportions. Thus, the total 
frequency or N will equal 1. Furthermore, if we use standard scores 
instead of raw scores, the standard deviation of any distribution is 1. 
Hence, if we agree to use proportional frequencies and standard 
scores, we may simplify equation (5.1) to 


(5.2) 


* The symbols М and 6 would be more appropriate, since (5.1) is the equa- 
tion of a normal population distribution, rather than an observed distribution, 
but the distinction is unimportant at this point. 
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in which y’ is the ordinate corresponding to a given standard score 
z in a normal distribution. By substituting for т and е their numeri- 
cal values 3.142 and 2.718, respectively, we can further simplify 
the equation of the normal curve to 


= (.3989) (2.718) -7 (5.3) 


Taking logarithms in (5.3) results in the equation 


log у’ = log .3989 — = log 2.718. (5.4) 


These equations represent any and all normal distributions in 
which frequencies have been transformed lo proporlions and raw scores 

to standard scores. 
The height or ordinate у’ for any value of z may readily be com- 
puted by use of equation (5.4). The values of y' for selected values 
of z are shown below and plot- 


040 ted in Fig. 5.3. 

g 0.30 When z Y 
5 

-4 z .50, у’ 

5 pa z -1.00, у” 

9 оло z —L50, y 

z —2.00, y 

0 2 —2.50, у” 

RENET CO 12592493 z 3.00, y' = 0044 


Xr ог: 


Fig. 5.3. The standard normal 
curve. 


When we inspect the equation 
of the standard normal curve 
(sometimes called the unit nor- 
mal curve) and Fig. 5.3, the following properties of the curve stand 
out clearly: 


a. The curve has its maximum height or ordinate at 2 = 0. 

b. Since z is squared, the ordinate of the curve for a negative value 
of z is the same as the ordinate for a positive value. Thus the 
curve is symmetrical about the ordinate at 2 = 0. 

c. Properties (a) and (b) above indicate that the mean, median, 
and mode in the normal distribution coincide. 

d. The curve changes from convex to concave curvature at z = +1, 
i.e., the curve intersects its own tangents at these points, 
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e. The curve approaches but never touches the base line within 
finite limits. 


By the methods used above, we might compute the ordinates of 
the standard normal curve corresponding to any given value of 2, 
but it is not necessary. Various tables of ordinates are available, and 
one such table, Table B, giving the ordinates for values of z from 
0.00 to 3.00 and a few values above 3.00, is included in Appendix C. 
It will be noted that the second decimal place in z is recorded at the 
top of the table. If we wish to find the ordinate at, say, 2 = 2.15, 
we locate 2.1 in the left-hand column, go across the row until we 
reach the column headed .05, and read .0396. As has been noted, 
the ordinate corresponding to a negative value of z is the same as 
that corresponding to a positive value; hence, it is not necessary to 
include negative values of z in the table. 


Exercises 


3. Using the ordinates at various values of z, as given in Table B, plot. 
a standard normal curve. In order to bring out details of curvature, 
plot several points in the neighborhoods of 2 = 0,2 = +1, andz = —1. 

4, In a given normal distribution of very large №, the mean is 50 and the 
standard deviation is 10, What will be the relative values of the ordi- 
nates of the curve of the distribution at the following points on the 
scale of scores: 20, 25, 30, 40, 50, 60, 70, 75, 80? 


Areas and Frequencies under the Normal Curve 


We have seen that the normal curve is a frequency polygon having 
an infinite number of sides, and that consequently the area under 
the curve corresponds to the number of cases in the distribution it 
represents. In deriving the equation of the standard normal curve, 
equation (5.2), relative frequencies or proportions were employed; 
hence, the total area under the standard normal curve equals | or 
unity. (Practically, of course, the area under the curve is always 
somewhat less than 1, since the curve does not touch the base line 
or abscissa within finite limits. The practical limits are usually 
considered to extend from about —3 to +3 on the scale of z and 
thus to include about 99.7 per cent of the area.) 

The fact that the area under the standard normal curve may be 
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represented by unity makes it possible to compute, once and for 
all, the proportion of the area which is included between any two 
ordinates of the curve, and hence the proportion of the total num- 
ber of cases falling between the z values of the ordinates. The actual 
computation of such proportions of area is a matter of considerable 
mathematical complexity, and we shall not attempt to show here 
how it is done. The understanding and use of the proportions once 
they are available, however, is not difficult. Since the functions of the 
normal curve underlie a great 
deal of both statistical theory 
and technique, we shall consider 
them at some length. After the 
area relationships are under- 
stood, we shall find it easy to 
deal with questions regarding the 
number of scores or frequency 
in any interval of a normal dis- 
0 196 tribution and related questions. 

z Scale Proportions of Area be- 

ЕЕ 510 Proportion of tarea; ^L Een Given Ordinates of 
under normal curve between ordi- the Normal Curve. The pro- 
nates at z = 0 and z = 1.96. portions of area under the nor- 

mal curve included by the ordi- 
nate at the mean (z — 0) and the ordinates at .01z distances from 
the mean are given in Table A, Appendix C. The left-hand column 
of the table gives z values to tenths; the second decimal place in z 
is to be found across the top of the table. If we wish to know the 
area between the ordinate at z — 0.00 and the ordinate at z — 1.96, 
we go down the left-hand column to 1.9, over to the column headed 
.06, and read .4750. The area in question is shown in Fig. 5.4. 
Since the total area under the curve to the right of the ordinate 
at z — 0.00 is .5000, the area to the right of the ordinate at z — 
1.96 is .0250. 

Now suppose that we wish to know the proportion of area under 
the curve between the ordinates at, say, z = —1.00 and 2 = +1.00. 
From the table we find the area under the curve between the ordi- 
nates at z — 0.00 and z — 1.00 to be .3413. Since the same propor- 
tion of area lies between the ordinate at z — 0.00 and z — — 1.00, 
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the ordinates at z = —1.00 and z = +1.00 include .3413 + .3413 
or .6826 of the area, as shown in Fig. 5.5, 

Now suppose that we wish to find the area under the curve be- 
tween the ordinates at z = .50 and z = 2.50. The area between the 
ordinates at z = 0.00 and z = 2.50 is .4938 and the area between 
the ordinates at z = 0.00 and z = .50 is .1915. Hence, the area 
between the ordinates at 2 = .50 and 2 = 2.50 is .4938 — .1915 or 
.3023, as shown in Fig. 5.6. 


0.50 2.50 
z Scale 


Fig. 5.5. Proportion of area Fig. 5.6. Proportion of area 
under normal curve between ог- under normal curye between ог- 
dinates at z = — 1.00 and z= dinates at z = .50 and 2 = 2.50. 


+ 1.00. 


If it is desired to express proportions of area as percentages we 
may, of course, merely multiply by 100. Thus, .3023 of the area 
may be expressed as 30.23 per cent of the area. 

Determining the Interval Which Includes a Given Propor- 
tion of Area. We may use Table А to determine the z values of the 
ordinates which include specified proportions or percentages of the 
area under the standard normal curve. For example, suppose we 
wish to determine the z values of the ordinates which include the 
middle 50 per cent or .5000 of the area. Since the middle .5000 is 
specified, it follows that .2500 will lie on either side of the ordinate 
at the mean. Hence, we find the proportion in the body of Table А 
which is nearest to .2500. This is .2486, which corresponds to a z 
value of .67, and .67 is the best approximation we can make without 
interpolation. To two-figure accuracy, then, the z interval which 
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includes the middle .5000 of the area is —.67 to +.67. By linear 
interpolation we would obtain .671451 or .6745 to four-figure ac- 
curacy. (See Fig. 5.7.) 

It is suggested that the student work exrs. 5 and 6, p. 202, at 
this time, so that he will become 
better acquainted with the con- 
struction and use of Table A. 

Relative and Absolute Fre- 
quencies from Areas. Owing 
to the correspondence of incre- 
ments of area under the standard 
normal curve to relative fre- 
quencies in intervals, Table A 
might just as well be titled, 
“Relative Frequency of Nor- 

О. ме хезу the ees E dea У 
ordinates which include the mid- А 
dle .5000 of the area under the Distance from the Mean.” In a 
normal curve. normal distribution .3413 of the 

scores lie in the interval bounded 
by z = 0.00 and z = —1.00; .3413 lie in the interval bounded by 
2 = 0.00 and 2 = +1.00; .5000 lie in the interval bounded by 
z = —.6745 and z = +.6745; and so on. Such facts as these were 
anticipated in Chapter IV and were shown to hold approximately 
for several somewhat nonnormal distributions. In the normal dis- 
tribution, of course, they are exact. 

Obviously, for a given distribution, the relative frequencies may 
be changed to absolute frequencies by multiplying by the total fre- 
quency №. For example, if there аге 1,000 scores іп a normal dis- 
tribution, 1,000 Х .3413 or about 341 lie in the interval bounded 
Бу z = 0.00 and 2 = 1.00; 1,000 Х .3023 or about 302 lie in the 
interval z = .50 to z = 2.50 (cf. Fig. 5.6); and so on. 

In order to use the areas of Table A in determining frequencies in 
specified intervals on the scale of normally distributed scores, we 
must always work with standard or z scores. Suppose, for example, 
that we have a set of 500 normally distributed scores, and that the 
mean and standard deviation of the set are 100.00 and 15.00, re- 
spectively. In order to determine the number of scores lying be- 


-0.6745 +0.6745 
z Scale 


+ 
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tween, say, 88 and 130, we must first change 88 and 130 to z scores. 
Since the mean is 100.00 and the standard deviation is 15.00, the 
z scores corresponding to 88 and 130 are —.80 and +2.00, respec- 
tively. Turning to Table A, we find that the proportion of scores 
lying between —.80 and the mean is .2881 and the proportion be- 
tween the mean and +2.00 is .4772. Hence, the proportion of scores 
between —.80 and 2.00 is .7653, 
and the number of scores is .7653 
X 500 or 382.65. (See Fig. 5.8.) 

Fitting a Normal Curve to 
a Given Frequency Distribu- 
tion. The procedure for fitting 
a normal curve to a given fre- 
quency distribution involves 


primarily the calculation of the Ж | 
frequencies which would be ob- 25соге -0.80 0 2.00 
served in the classes of the dis- Rewscore 88 100 130 
tribution, if the scores were in Fig. 5.8. Number of scores 


fact distributed normally. falling between 88 and 130 in a 
normal distribution, М = 500, 


The procedure is illustrated in i > 

Table £2. ана ашына OEE 

largely self-explanatory, a few remarks may be helpful. The table is 
arranged to facilitate the calculation of the proportions of area or 
relative frequencies between the real limits of the respective classes. 
After these are determined they are multiplied by 138, the Nin the . 
present example, to obtain the absolute theoretical normal fre- 
quencies in classes. The bottom class interval is considered to extend 
from — to 389.5 and the top interval from 749.5 to +% on the 
scale of scores. The z values of the real class limits shown in the 
table are found, of course, by subtracting the mean from the limits 
and dividing by the standard deviation. Thus the z value of 599.5 is 
(599.5 — 552.11)/79.32 or .60. The proportions in column 5 are 
obtained from Table A, and those in column 6 are the successive 
differences between the proportions in column 5. Finally, the pro- 
portions are multiplied by 138. These products, shown in the last 
column of the table, are the expected or theoretical frequencies in 
similar classes of a normal distribution in which М = 138, М = 
552.11, and е = 79.32. If the student sketches a normal curve and 
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indicates the z values of the real class limits, he will find it easy 
to follow the computations laid out in Table 5.2. (See the partial 
sketch of Fig. 5.9.) 


TABLE 5.2 
COMPUTATION OF THEORETICAL NORMAL FREQUENCIES 
IN CLASS INTERVALS OF A GIVEN DISTRIBUTION IN 
WHICH М = 138, М = 552.11, в. = 79:32 
(Distribution of MAT scores, table II, appendix В) 


PROPOR- 
TION OF PROPOR- | THEORETI- 
REAL Z VALUE OF AREA TION OF CAL NOR- 
CLASS 7 WEE REAL BETWEEN |AREA, AA, | MAL FRE- 
INTERVAL tae CLASS MEAN & BETWEEN |QUENCY IN 
LIMIT REAL CLASS CLASS: 
CLASS LIMITS |АА X 138 
LIMIT 
750-779 1 Tx = 8 P .0064 .88 
79 49 9 A T 
120-749 муше АЙН -0110 1.52 
690-719 Ере 2n 20244 3.37 
Ges» | 12| 69.5 | rss ою | 10532 
m 5 | 629.5 98 ; ie 
- 600-629 | 13 i .1108 15.29 
599.5 .60 : 
570-5 
sux» || 595 | m ЕТ 
БҮРЕ; 539.5 | - 16 ANE. d 
AUR) WAG | 2% 1418 19.57 
Оо Күш oes ‚1158 15.98 
С о ‚0803 11.08 
420-449 10| 4195 -16 .0510 7.04 
390-419 Е [Кр ‚0273 3.77 
360-389 d hare at i .0202 2.79 
SUM 138 1.0000 | 138.01 


The observed frequencies of column 2, Table 5.2, may be graph- 
ically compared with the theoretical normal frequencies of the last 
column by means of frequency polygons or histograms. The former 
tend to bring out more clearly the extent to which the given distribu- 
tion is fitted by the normal curve. In making the graph, a frequency 
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polygon of the given distribution is first drawn in the usual manner, 
then the normal curve is drawn on the same axes, as shown in Fig. 


5.10. Since the bottom and top 
classes of the normalized distri- 
bution are of unspecified length, 
the curve must be extended 
free-hand beyond the second 
from bottom and second from 
top class mid-points. The inter- 
mediate points of the curve are, 
of course, plotted from the 
theoretical normal frequencies. 

In a later chapter we shall 
make use of the differences be- 
tween observed and theoretical 
normal frequencies іп the 
classes of a sample distribu- 
tion in testing the assumption 
of population normality. 
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Fig. 5.9. Proportions of area 
or relative frequencies in bottom 
three classes of the normalized 
distribution of Table 5.2. 
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Fig. 5.10. Frequency polygon and fitted normal curve. (From Table 5.2.) 
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Exercises 


. By use of Table A, determine the proportion of area included by the 


ordinates at the following z values. Sketch a curve and crosshatch 
the area in each case. 


= «o-0.00 — ю-—1.96 
0.00— + = +1.96-+ ® 
—2.00- --2.00 —1.96-+1.96 
—3.00-+3 .00 +0.20-+2.30 
0.00-+2.58 —0.60-+1.10 
—1.64- 0.00 —1.70-—0.40 


‚ By use of Table A determine the 2 values (to hundredths) of the ordi- 


nates of the standard normal сигуе which include: 


а. The middle 25 per cent of the area. 
b. The middle 80 per cent of the area. 
c. The middle 95 per cent of the area. 
d. The middle 99 per cent of the area. 


- Prove that, in the normal distribution, Ku, of formula (4.19) equals 


.263. 


. The mean and standard deviation of а normal distribution of 500 scores 


are 75.0 and 12.5 respectively. 


. How many scores lie between 50 and 100? 

. How many scores lie below 62.5? 

. What interval on the scale of scores includes the middle 250 scores? 

‚ What deviation from the mean will be exceeded by 7 per cent of the 
scores? 

e. What deviation from the mean will be exceeded by 93 per cent of 

the scores? 

f. What are Р» and Ру; of the distribution? 

g. Find Q of the distribution. (Cf. [c] above.) 

h. If a score were selected at random from the 500, what are the 

chances that it will fall in the interval 62.5-87.5? That it will fall 

below 50? That it will fall above 1002 


Во tp 


. Given a normal distribution in which / = 300, М = 42.00, and с = 


7.50, theoretically, 


a. How many scores will fall in the class whose real limits аге 39,5 and 
44.52 
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b. How many scores will fall in the class whose upper real limit is 
54.52 


10. Find the theoretical normal frequencies in the classes of one or more 
of the distributions of mental ages given in exr. 21, p. 63. Compare 
graphically the observed and theoretical normal frequencies. Saye the 
results for later reference. 


The Concept of Normality in Statistics 


The normal curve is a mathematical ideal. We never have an 
infinite number of observational data, and the data we do have in 
a given situation invariably show some departure from normal form. 
If any research worker has ever observed normally distributed 
data, the writer does not know about it. 

The fact that observational data are rarely if ever of truly normal 
form, however, detracts little from the usefulness of the normal 
curve in statistical theory. The tendency of raw data and statistics, 
such as means, derived from samples to approximate normality 
obtains in many situations and under many conditions. In these 
situations the use of the normal curve properties in analyzing data 
is both logically defensible and enormously useful. 

The lack of reality in the assumptions underlying normal curve 
theory is quite like that characterizing the mathematical treatment 
of observational data in any field. The agreement between theory 
and observation is never perfect, and, somewhat paradoxically, 
the more precisely we attempt to reconcile theory and observation, 
the more aware we become of inherent uncertainties in both. The 
need to minimize such uncertainties by refining both theory and 
observation is ever present and fundamental, but in the practical 
situation we are justified in using quite imperfect theories if they 
demonstrably increase our success in dealing with observations. 

In the field of education and psychology, as well as in various 
other fields, normal distribution theory has been in wide approxi- 
mate agreement with observation and has been remarkably fruit- 
ful. Professor Kelley observes (Ref. 3, preface): 

Many years ago my inspiring teacher, Henry Lewis Rietz, observed 
that statistical procedures derived from the normal distribution were 


born of a higher realm than other procedures. 1 disbelieved this with a 
religious fervor. Though, as time has passed, I have espoused curvilinear 
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regression and skew distributions with gusto, I have found myself fre- 
quently slipping, for the data would not support me, and linear relation- 
ships and nearly normal distributions have in my experience as a psy- 
chologist cropped up with a frequency which has chided and mocked 
me. I still reserve judgment as to the place of birth of the normal dis- 
tribution, but that its sphere of usefulness is extended in connection with 
biological and psychological phenomena I no longer һауе the slightest 
doubt. 


The Conditions of Normality. The writer has listened to many 
heated arguments regarding the “true” normality of intelligence, 
scholastic achievement, and other phenomena which vary from 
individual to individual, and has always, like the poetic sage, come 
out of the same door in which he went. 

To break the circularity and ambiguity of arguments for and 
against normality, it is necessary that the conditions which give 
rise to the normal distribution be identified. The most important 
of these conditions are (1) the “causal” factors affecting individual 
items are very numerous and independent in action; (2) negative 
and positive effects tend to be equal in number, equal in absolute 
weight, and very small; and (3) the total effect is the algebraic sum 
of the individual effects. (Cf. Ref. 7, p. 115.) 

When the measures of some variable turn out to be distributed 
nonnormally, it may be that the “causes” of variation violate the 
conditions of normality, or it may be that the sample of measures, 
or the sample of individuals to be measured, has been selected in 
some special way. Let us illustrate the effect of special selection. 
The Regents’ mathematics scores of the 76 freshmen of Table П, 
Appendix В, аге shown in Fig. 5.11. It will be noted that the dis- 
tribution is markedly skewed to the left. This would be expected, 
since the freshmen һауе been selected largely upon the basis of 
superior scholastic aptitude. Any achievement or intelligence test 
constructed for use in a less selective group would tend to yield 
negatively skewed scores when administered to these freshmen. 

In dealing with the complex variables of social science, it is rarely 
if ever possible to demonstrate a priori that the conditions of the 
normal distribution are present; hence, normality or nonnormality 
must be inferred from the measures themselves. There is nothing 
wrong in making such an inference, providing we recognize the 
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limitations involved. Most psychological and educational variables 
can be measured in more ways than one, and it is quite possible 
that each method of measurement will yield a unique distribution 
of measures. Under these circumstances it would be presumptuous 
indeed to infer that all possible measures of a variable are dis- 
tributed normally because a particular sample of measures of it 


[s] 


Frequency 


L = 1 —— 
66 69 72 75 78 81 84 87 90 93 96 99 
Score _ 


Fig. 5.11. Distribution of Regents’ mathematics scores of 76 college 
freshmen, showing nonnormality resulting from selection. 


approximates that form. It can only be inferred that, since the 
sample approximates normality, the population of the particular 
measures under consideration is normal, or approximately so. The 
inference does not carry to populations of other measures of the 
variable, and it is absurd to argue that variables are distributed 
normally or nonnormally without reference to the sample, the 
population, and the method of measurement. 

But it is equally absurd to argue that the normality of a particu- 
lar distribution of measures of a variable is arbitrary and of little 
significance, and that an analysis of the measures which utilizes 
normal theory and technique is therefore superficial or erroneous. 
It is not unusual in educational writings to encounter such state- 
ments as this: 
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Perhaps one of the principal reasons that we have exaggerated or 
misrepresented the importance of the normal curve in education and 
psychology is the fact that the scores obtained on educational and 
psychological tests, for almost any unselected group of pupils, so fre- 
quently present what may be roughly described as a bell-shaped form 
of distribution. This, however, is not of any very fundamental significance, 
since most of these tests are deliberately constructed so as to yield 
approximately symmetrical distributions of scores. 


Such statements imply that there is some invariant and essential 
property of a trait which transcends our measures of it. Although 
this may be so, we are limited in statistical analysis to the measures 
we obtain. When we can obtain normally distributed measures, our 
analyses are enormously facilitated. This is not to say that the 
investigation of the basic causes of the variation of a phenomenon 
is unimportant. The point stressed here is only that if a distribution 
of particular measures of the phenomenon is approximately normal, 
whatever the causes of the variation or whatever the essential 
nature of the phenomenon, the matter is of fundamental statistical 
consequence. 

Normality as an Experimental Fact. As a matter of common 
observation, the normal tendency, namely, the greater the deviation 
of a value from the mean of the series the less frequently the value 
occurs, prevails to yield at least an approximately normal dis- 
tribution in many situations and under many conditions. Smith 
and Duncan (Ref. 8, p. 297) observe: 


Natural forces appear to generate normal frequency distributions in 
many fields. Physical measurements have already been mentioned . . . 
"The grades of students on examinations, hourly earnings of workers, the 
length of life of electric-light bulbs, the distance of baseball throws of. 
first-year high school girls, are all normally distributed variables. In these 
fields and in many others, it would seem that the conditions of variation 
are those which theoretically give rise to the normal curve. 


Tt has been known for many years that errors of observation or 
measurement tend to be distributed normally. For this reason the 
normal curve is sometimes called the curve of error. If a great many 
independent measurements are made of an object, the distribution 
of the measures generally approximates normality closely, being 
concentrated about a presumably "true" value and tailing off to 
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either side. This is illustrated in Fig. 5.12. It will be noted that 
positive and negative errors or deviations from the mean are present 
in approximately equal numbers and that small errors are much 
more frequent than large ones. In other words, the larger the error 
the less frequently il occurs. In later chapters we shall find the “normal 
tendency” of errors to be helpful in the interpretation of educational 
measurements. 


40 
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M=240.31 
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22887 23927 22987 24027 24087 241.37 
23912 239.42 24012 24062 24112 24162 


Centimeters 


Fig. 5.12. Distribution of 150 measurements of the length of a table. 


Certain aspects of statistical sampling theory are closely related 
to the theory of errors. When we have a sample from a specified 
population we may think of the mean, for example, as an estimate 
or approximation of the population mean. If we take a large number 
of samples at random from a normal population, the means of the 
samples will themselves be distributed normally. This fact can be 
proved mathematically. Of greater practical importance, however, 
is the experimental fact that the means of samples tend to be dis- 
tributed normally regardless of the form of the parent population. 

Consider one of the many experimental investigations of the 
form of distribution of means in samples drawn from nonnormal 
populations (Ref. 1, pp. 111-112). Two hundred and eighty IBM 
cards were punched with numbers corresponding to 280 scores 
distributed as follows: 
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SCORE f 
1,710 37 
405 189 
29 43 
15 9 
$ 2 
TOTAL 280 


The cards were thoroughly shuffled and then placed in a tabulating 
machine. After 25 cards had been run through the tabulator, their 
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Fig. 5.13. Distribution of means of 1,000 random samples of 25. 
(From Table 5.3.) 


total was recorded. This procedure was repeated until 1,000 samples 
of 25 scores each were obtained. The distribution of the means of 
the 1,000 samples is shown in Table 5.3 and Fig. 5.13. It will be 
noted that, despite the extreme departure from normality in the 
parent population of 280 scores, the distribution of the means of 
the 1,000 samples approximates normality rather closely, The 
student can verify that the alpha values of the distribution are 
about аз = .33 and a4 = 3.05. 

Tt is an experimental fact that, if the size of the sample is about 
25 or more and the population at least 10 times as large as the 
sample, the population distribution has relatively little influence 
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on the form of the distribution of random sample means. It is safe 
to conclude that the means of random samples from the sort of 
population we ordinarily have in educational research are dis- 
tributed in normal form, or nearly so. As we shall see in a later 
chapter, this fact is of great importance in sampling theory. 


TABLE 5.3 
DISTRIBUTION OF MEANS OF 1,000 
SAMPLES OF 25 DRAWN FROM A 

MARKEDLY NONNORMAL POPULATION 


---------------------------- 


МЕАМ FREQUENCY 
MEAN 2 
760- 9 
680- 36 
600- 130 
520- 254 
440- 310 
360- 203 
280- 54 
200- 2 
TOTAL 1,000 


Summary. The normal distribution is the foundation of sta- 
tistical theory. Although it is rarely if ever possible, in dealing with 
the complex variables of social research, to show that the conditions 
which theoretically give rise to the normal distribution are present, 
it is an observable fact that a surprisingly large number of such 
variables appear to be distributed in approximately normal form. 
The theoretical justification of the normal distribution and its use 
as a model in statistical analysis, when it satisfactorily fits an 
observed distribution, are two quite different matters. Since most 
variables may be measured in more ways than one, a statement 
regarding the normality or nonnormality of a distribution has little 
meaning without reference to the sample, the population, and the 
method of measurement. 

In particular, errors of measurement and certain sample sta- 
tistics, such as the mean, appear to be distributed normally, or 
nearly so. 
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Exercises 
11. Criticize the two extreme statements below. 


“The symmetrical or bell-shaped distribution is so nearly uni- 
ш in statistics that it has come to be called the normal curve. 
Many scien! һауе come to accept with some reservation the 
view that distributions of traits and abilities from representative 
groups tend to be normal. Therefore, any serious departure from the 
normal eurve is in general interpreted that the traits or abilities 
measured do not represent a random sampling of such traits or 
abilities. Consequently, if we wish to be sure that, our computations 
of central tendency or variability are accurate, we must measure 
these traits or abilities in sufficient number to obtain a normal 
distribution." 

b. "Since the conditions of the normal distribution are rarely met, 
the use of the normal curve cannot be justified in social science.” 


12. Explain by reference to the distribution of errors of measurement why 
the mean of a large number of observed weights of an object, would 
approximate closely the “true” weight of the object. 

13. Suppose that it is known that a large number of sample means is 
distributed normally about a mean value 40.0 with a standard devia- 
tion of 2.0. What proportion of the means would be 42.0 or more? 
45 or more? 36 or less? 


Uses of the Normal Curve in Educational Measurement 


The normal curve has its most important application in sampling 
problems; in fact, it is the very foundation of statistical probability 
and sampling theory, as we shall see in Chapter VIII. At this point, 
we shall consider a few of the many practical uses to which the 
normal curve is put in educational measurement. In general, when- 
ever we are dealing with a variable which appears to be normally 
distributed, or which we are willing to assume would be normally 
distributed if we could measure it more precisely, normal curve 
properties may be used to advantage. Although the curve is mathe- 
matically complex, its application to practical problems will present 
litt]e difficulty if the relationships between z scores, ordinates, and 
areas, as tabulated in Tables А and B, are understood. 
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Use of the Normal Curve as a Model for Distributing Cate- 
gorical Ratings. The normal curve is perhaps best known to 
teachers and students through its use as a model for distributing 
school marks and other categorical ratings. The use is based upon 
two assumptions, (1) that the variable being rated, e.g., achieve- 
ment in English, is normally distributed on a continuous scale, and 
(2) that the categories cover known intervals on the continuum, 


0.0668 
z Scale —2.50 —1.50 -0.50+0.50 +1.50 +2.50 
Fig. 5.14. The normal curye as a model for the distribution of cate- 


gorical ratings. 


If a five-category marking scheme А, B, C, D, E, and equal intervals 
are employed, and if the practical limits of the z scale are considered 
to be —2.50 to -Е2.50, each interval will extend 1.0 standard devi- 
ation units. Under the two assumptions, the distribution of five- 
category marks would follow the proportions shown in Fig. 5.14. 
Since neither assumption is necessarily sound in a given class, the 
normal curve as à model for the distribution of marks should be 
used discriminatingly. The justification of its use lies in the fre- 
quently observed tendency of reliable measures of the achievement 
of a group of students to approach normality. But, as has been 
noted earlier, every assumption of normality needs to be carefully 
examined before normal сигуе theory and technique are applied. 

The *stanine" (standard nine) scoring scale, introduced by the 
American Air Force during World War II, illustrates an interesting 
use of the normal curve as a model for transforming test scores toa 
set of single-digit scores. (See exr. 16, p. 225.) 

Standard Scores and the Percentile System. Since the stand- 
ard deviation is the unit of measurement employed in the standard 
normal curve, the relationship between standard scores and per- 
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centiles in a normal distribution is the same as that between the 
z scores and proportions indicated in Table A. Hence, percentiles 
and percentile ranks in a given normal distribution are easily deter- 
mined by a table of normal curve areas. The procedure really in- 
volves nothing new; in the preceding chapter we made use of the 
relationship in interpreting standard scores. (See Table 4.10, p. 163.) 

Suppose we һауе a normal distribution in which М = 60.00 and 
с = 15.00 and that we wish to find Ро, i.e., the point below which 
90 per cent of the scores lie. Since 50 per cent of the scores lie below 
the mean, we enter Table A at .3997 (as close as we can come to 
-4000) and find the corresponding z score to be 1.28. Since 1.28 
corresponds to a raw score of 1.28 X 15.0 + 60.00 or 79.2 in the 
given distribution, Py) = 79.2. 

Now suppose we wish to find the percentile rank of a score of 40 in 
the same distribution. The 2 score equivalent of 40 is (40 — 60.00)/ 
15 or —1.33. Entering Table A at 1.33, we find the proportion of 
scores below the point to be .5000 — .4082 or .0918; hence, a score 
of 40 in the given distribution has a percentile rank of approxi- 
mately 9. 

Since few if any observed distributions are normal, the results 
obtained by the method illustrated above are generally somewhat 
in error, but, unless the distribution is markedly nonnormal, the 
errors tend to be negligible in practical work. 

When there is reason to believe that a population distribution of 
scores is normal or nearly so and it is desired to. construct a table 
of percentile norms from the information obtained by giving a test 
to a sample, the method may give somewhat more stable norms 
than those obtained by direct computation from the actual distribu- 
tion of scores in the sample. 

"Transforming Qualitative Data. It is frequently useful to 
transform qualitative data into numerical scores. One of the com- 
monest ways of making the transformation is that of assigning 
convenient small numbers to each given quality or category, as is 
done when the numbers 5, 4, 3, 2, 1 are assigned to the school marks 
А, B, C, D, E, respectively. This method is based upon the assump- 
tion that the numerical differences between categories are equal, 

In situations where a qualitatively ordered variable can be con- 
sidered to be normally distributed, the normal curve provides a 
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convenient and rational method of quantifying the observations 
of the variable. Suppose that teaching ability in a group of 40 
teachers has been rated by a 
supervisor, with results as shown 
in Table 5.4. If the assumption 
is reasonable that teaching ability 
is normally distributed in the 
given group, we may consider the 
proportions as segments of the 
normal curve, as shown in Fig. 
5.15. Our problem now becomes 
that of determining an average 
value for each of the segments of Fig. 5.15. Normal curve seg- 
the curve demarcated by thepro- ments demarcated by proportions. 
2 (From Table 5.4.) 

portions. We cannot take the 

midpoints of the z score intervals at the bases of the segments, 
since the distribution represented by a segment is not ordinarily 
symmetrical. 

In practice, either the medians or the means of the segments may 
be taken as numerical averages or scores corresponding to the cate- 
gories. The median of a segment is easily found by determining the 
z value of the ordinate which bisects the segment. Let us find the 
medians of the two left-hand segments shown in Fig. 5.15. Since 
1/2 of the area of the .075 segment is .0375, we need only determine 
the z value of the ordinate to the left of which .0375 of the area 


TABLE 5.4 
SUPERVISOR RATINGS OF 40 TEACHERS 


А 


NUMBER OF PROPORTION OF 

TEACHERS RE- TEACHERS RE- 

RATING CEIVING RATING CEIVING RATING 
A—Excellent 10 
B—Good 9 
C—Average 14 
D—Fair 4 
E—Poor 3 
SUM 40 
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lies to find the median of the .075 segment. Hence, we enter the 
table of areas, Table A, at .5000 — .0375 or .4625 and find the 
corresponding z to be 1.78 to which a negative sign attaches. (Why?) 
To find the median of the .100 segment, we must determine the z 
value of the ordinate to the left of which .075 + .100/2 or .1250 
of the area lies. Hence, we enter Table A at .3749, which is as close 
as we can come to .3750, and find the corresponding z to be —1.15. 
The other medians and the organization of work for computing 
medians are shown in Table 5.5. The procedure for finding the 
median of a segment of the normal curve may be stated: 


a. Add one-half of the proportion represented by the given segment 
to the total of proportions to the left of the segment. 

b. Find the difference between the sum and .5000. (If the sum is 
less than .5000, the sign of the median will be negative; if greater 
than .5000, the sign of the median will be positive.) 

с. In Table A find the value of 2 which corresponds to the difference 
and attach the proper sign. 


The z value thus found is the median of the given segment. It is, 
of course, possible to work from the upper or right end of the scale, 
if desired. The student will encounter little difficulty in finding 
medians, particularly if he first sketches the normal curve and the 
segments corresponding to the given proportions. 


TABLE 5.5 
COMPUTATION OF NORMALIZED MEDIAN VALUES OF 
SUPERVISOR RATINGS OF TEACHERS 
(Data from table 5.4) 


PROPORTION 

RATING PROPORTION BELOW CLASS 
ов RECEIVING PLUS 1/2 рво- 

CLASS RATING PORTION IN CLASS MEDIAN 
A .250 .8750 1.15 
В .225 . 6375 .35 
с .350 .3500 - .89 
р ‚100 .1250 —1.15 
Е 75 . 0375 =1.78 
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The procedure for finding the mean of a segment of the normal 
curve is considerably more involved than the procedure for finding 
the median, and we shall not describe it here. (See Ref. 4, p. 297.) 
In practical work the median usually is quite adequate. 

Combining Qualitative Data. The transformation described 
above is frequently useful in comparing or combining sets of quali- 
tative data, such as judges’ ratings, letter grades, and categorical 
ranks on a scale. 

It is well known that judges tend to differ substantially in their 
ratings of a group of individuals, with respect both to the rating 
of any given individual and to the proportions of the group placed 
in different categories. If it can be assumed that the rated variable 
is normally distributed, it is possible to use the transformation 
described above to arrive at an average rating for each individual, 
thus making allowance for varying standards of rating or degrees. 
of leniency on the part of the judges. 

Suppose three supervisors, J, K, and L, have rated 40 teachers. 
on a five-point scale with results as shown in Table 5.6. The median 
values of each supervisor's ratings shown in the table are found 
by the procedure set forth in Table 5.5. When we inspect these 
values we note rather wide variations. For example, a rating of 
“excellent” under supervisor J has a median value of .67; the same 


TABLE 5.6 


RATINGS AND NORMALIZED MEDIAN VALUES OF RATINGS 
OF 40 TEACHERS BY 3 SUPERVISORS 


к 


SUPERVISOR 
RATING J K L 

N propor- Мап N рворов- Мап N propor- Мат 

TION TION TION 

Excellent 20 .500 .67| 4  .100 0 .000 
Good 10 .250 —.32| 10 .250 .76 | 12 .300 1.04 
Average 5 125 -.89|12 .300 .00| 20 .500  —.13 
Fair 5 .125 -1.53 | 10 .250 8 .900 -1.28 

Poor 0 .000 .100 0 .000 


SUM 40 1.000 40 1.000 
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rating under supervisor K has a median of 1.64. This is due, of 
course, to the fact that the proportion of “excellent” ratings given 
by J is much greater than that given by К. 

Now consider 2 given teachers in the group of 40 who have re- 
ceived, let us say, these ratings: 


SUPERVISOR J SUPERVISOR K SUPERVISOR L 
Teacher A Excellent Good Average 
Teacher B Average Excellent Good 


In determining a composite or average for teacher A, using the me- 
dian values of the ratings as shown in Table 5.6, we have (.67 + 
-76 - .13)/3 or .43 and for teacher B, (—.89 + 1.64 + 1.04)/3 or 
.60. If each rating had received the same weight, the 2 teachers 
would have had equal average ratings. 

It should be noted that in many cases, perhaps the majority, 
the ratings of judges are hardly reliable enough to warrant refined 
treatment, the variation in the ratings itself being presumptive 
evidence of unreliability. However, whether or not it improves the 
reliability of average ratings, the method of combining ratings 
described above has three distinct advantages. First, it yields a 
quantitative average; second, it does not presuppose equal differ- 
ences between categories; and finally, it makes allowance for varying 
standards of rating on the part of the judges. 

Strictly speaking, in combining qualitative data it would be 
better to use the normalized mean values of the segments or cate- 
gories, since the mean values are algebraic in nature, but the refine- 
ment ordinarily is unwarranted in view of the nature of the original 
data. 

Normalizing Numerical Data. The normal curve may be used 
to normalize numerical as well as qualitative data. Consider the 
Arithmetic Problems Test scores of the 293 eighth-grade pupils of 
Table I, Appendix B, now shown in Table 5.7 and Fig. 5.16. The 
distribution appears to be multimodal, somewhat platykurtic and 
skewed to the right. Let us assume that the form of the distribution 
is not due to sampling fluctuations, or better, let us suppose that 
quite similar distributions of scores were observed when the same 
‘test was given to other groups of eighth-graders, 


TABLE 5.7 


COMPUTATION OF NORMALIZED SCORES ОЕ 293 EIGHTH- 
GRADERS IN ARITHMETIC PROBLEMS TEST 
(Data from table I, appendix B) 


сом f BELOW SCORE VALUE ОЕ Z IN 
PLUS 1/2 FREQUENCY NORMAL CURVE 


RAW CUM AT SCORE CORRESPONDING 
score f afi NO. PROPORTION TO PROPORTION Т SCORE Z SCORE 
29 2 12934 292 .9966 2.71 77 73 
2 1 291 290% 9915 2.39 74 72 
2 290 289 ‚9864 2. 72 70 
26 3 288 286% 977 2. 70 69 
4 285 283 .9659 1. 68 67 
8 281 217 .9454 17 66 66 
14 273 266 .9079 1.3 63 64 
6 259 256 8737 L 61 63 
13 253 246% e 60 61 
9 240 235% 59 60 
19 14 231 224 57 58 
18 9 217 21223% 56 57 
17 13 208 201% 55 55 
16 6 195 192 54 54 
15 17 189 18015 53 52 
14 15 172 164% 52 51 
13 19 157 147% 50 49 
12 13 138 151% = 49 48 
11 19 125 115% = 47 46 
10 15 106 9815 = 46 45 
9 14 91 84 = 44 43 
8 14 77 70 = 43 42 
T 18 63 54 =: 41 40 
6 10 45 10 = 39 39 
5 6 35 32 =l, 38 37 
4 11 29 23% zi 36 36 
3 3 18 1615 —1. 34 34 
2 7 15 11% alle 32 33 
1 4 8 6 —2 30 31 
0 4 4 2 =2 25 30 
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It is often the case, when the distribution of the scores on an edu- 
cational or psychological test is stable but lacking in normality, 
that a somewhat more discriminating test results if the raw scores 
are normalized, and the normal score equivalents of the raw scores 
used. 

The procedure in normalizing numerical data is exactly like that 
in normalizing qualitative data by determining median values of 
ordered categories, and the underlying assumptions are the same. 
Returning to Table 5.7, if the raw scores were ordered by letters 


20r 


Frequency 


{мл л л — у у, ү 


ME 
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 
Score 


Fig. 5.16. Frequency polygon of distribution of 
raw scores. (From Table 5.7.) 
instead of numbers, the correspondence between the two procedures 
would stand out clearly. In Table 5.7, as in Table 5.5, the work 
proceeds from the lower or left end of the scale, but it is possible to 
work from the upper or right end if desired. 

The effect of normalizing the raw scores is to shrink the raw score 
scale in some places and elongate it in others. The effect stands 
out clearly when the normalized z equivalents of the raw scores are 
plotted, as shown in Fig. 5.17. It will be noted that the difference 
between the raw scores 0 and 4 corresponds to a z difference of 1.07, 
while the difference between 12 and 16 corresponds to a z difference 
of .53. Other quite disproportional differences can be singled out, 
The disproportionality of course results from the assumption that 
the ability measured by the Arithmetic Problems Test is distributed 
normally. The raw scores must have varying values on the measur- 


The Normal Curve 219 


а ала 
z Equivalent -247 -1.40 —013 +086 4271 
à -176 -071 +040 +160 
Corresponding) „у=, жены шүн ; 
raw score 02” 72747 ПЕТ рар 20 К 29 


Fig. 5.17. = Equivalents and corresponding raw 
scores. (From Table 5.7.) 


ing scale, if the proportions of pupils at the various scores are 
to correspond to those of the normal distribution. If the assumption 
of normality is true, the test presumably yields better measures of 
the ability when the raw scores are normalized. 

There is another noteworthy advantage in normalizing raw scores. 
As was pointed out in Chapter IV, standard or z scores of two or 
more distributions are comparable, provided the distributions are 
normal. Since, in effect, they normalize the distributions, normalized 
z scores are always comparable. When the scores of individuals in 
several tests need to be combined into composite scores, normalizing 
the raw scores yields truly comparable measures which may be 
combined into composites of optimum fairness, provided the under- 
lying assumption of normality is tenable for each set of scores. 

T Scores and Z Scores. The z values of normalized scores, such 
as those shown in Table 5.7, always involve negative signs and 
decimals. Since these are inconvenient to work with, z values usually 
are multiplied by one constant and added to a second constant. 
When a median value of z in a segment of the normal distribution 
is multiplied by 10 and added to 50, the resulting score is universally 
known as a “ Т score."* 

* The term T score was originated by McCall (Ref. 6) in honor of Thorn- 


dike and Terman, pioneers in the application of statistics to educational 
measurement. 
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Let us look at the column headed “ T Score" in Table 5.7. Since 
the median z value of the proportion of pupils scoring 29 on the 
test is 2.71, the T score is 10 Х 2.71 - 50 or 77, rounded off to the 
nearest whole number. The student can verify the other Т scores 
in the column. Thus, if T scores were used instead of raw scores 
in the Arithmetic Problems Test, the 2 top pupils would have 
scores of 77 instead of 29, the 4 bottom pupils would have scores 
of 25 instead of 0, the 6 pupils at raw score 16 would have scores of 
54, and so on. 

It is interesting to compare T scores with Z scores. It will be 
recalled from Chapter IV that a Z score is defined by 


Z = 10z + 50, 


in which z is а standard score in a distribution, whether the dis- 
tribution is normal or otherwise. (Throughout the present chapter, 
we have been concerned with z scores in the special case of the 
normal distribution.) 

In order to determine the Z scores corresponding to the raw 
scores shown in Table 5.7 we must first determine the mean and 
Standard deviation of the raw scores. The student can verify that 
these are M = 13.36 ande = 6.66. Hence, the Z score corresponding 
to the raw score 29 is 10(29 — 13.36)/6.66 + 50 or about 73; the 
Z score corresponding to the raw score 28 is 72, and so оп, as 
entered in the last column of Table 5.7. Were it not for rounding, 
the differences between successive Z scores would be constant. 

As was shown in Chapter IV, the mean of a set of Z scores is 50 
and the standard deviation 10. However, the conversion to Z scores 
does not change the form of the distribution, and the units on the 
Z scale are proportional to the raw score scale throughout. In the 
special case of the normal distribution, Z scores are identical with 
T scores; the more the given distribution departs from normality 
the more marked are the differences between the two. The relation- 
ship between Z scores, raw scores, and T scores of the distribution 
of "Table 5.7 is brought out in Fig. 5.18. 

Since T scores are based upon the z scores of the normal curve, the 
T scores of two or more distributions are comparable and combin- 
able. Z scores, however, are truly comparable only if the distribu- 
tions are normal or show similar departures from normality. 
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Fig. 5.18. Z scores, raw scores, and T scores of the same distribution. 
(From Table 5.7.) 


The Relative Difficulty of Test Items. It is possible to consider 
two-category data, such as the right and wrong responses to a 
single test item, as normally distributed. Suppose that .95 or 95 
per cent of a group responds correctly to a test item and .05 or 
5 per cent incorrectly. If it is reasonable to assume that the ability 
elicited by the item is normally distributed, we may represent the 
correct and incorrect responses 
as proportions or areas of two 
segments under the normal curve, 
as shown in Fig. 5.19. In this situ- 
ation correct responses, although 
necessarily or conveniently scored 
“1,” are considered to represent 
amounts of ability ranging from 
barely enough to pass the item to 
enough to pass it with the greatest 
of ease; and incorrect responses, fx E ] 

а range from practically по abil- Fig. 2:29, Кторокноня рл 
à ES and failing an item testing а 
ity to almost enough ability to normally distributed ability. 
pass the item. 

The interpretation is reasonable in a great many testing situ- 
ations, since it is reasonable to suppose that an item which measured 
the ability under examination precisely would yield a distribution 
of scores more or less normal in form. (We need not be concerned 
here with the fact that we rarely if ever can devise such items.) № 
the item of Fig. 5.19 were such an item, the individual item scores 
would fall along the z-scale of the normal curve, and the division 
on the ability scale between “almost enough ability to pass" and 
“just enough ability to pass” the item would be at z = —1.64. This 
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point can always be determined if we know the proportion passing 
an item, and it provides a useful and rational index of the difficulty 
of the item. 

Suppose now that a test has been given to a sample of individuals 
and that the proportions of individuals responding correctly and 
incorrectly to four of the items are: 


PROPORTION PROPORTION 
ITEM PASSING FAILING 
a .95 .05 
b -65 .35 
с .35 65 
d -05 .95 


Obviously, item a is the least difficult, since .95 or 95 per cent of 
the group responded correctly to it, and item d the most difficult. 
In terms of proportions passing, the increase in difficulty from 
items а to d is constant. Under the assumptions that the ability 
elicited by the items is normally distributed, however, the pro- 
portions passing and failing the items represent areas of segments 
under the normal curve, and the difficulties do not vary constantly, 
but as the z values shown in Fig. 5.20. Bringing together these 
data, we have: 


DIFFICULTY DEFINED BY 


DIFFICULTY DEFINED 2 VALUE OF ORDINATE 
BY PROPORTIONS SEPARATING SUCCESSFUL AND 
ITEM PASSING UNSUCCESSFUL PROPORTIONS 
а 95 -1.64 
b ‚65 — ;89 
с ‚35 ‚39 
d -05 1.64 


The difficulty indexes іп the last column are sometimes called sigma 
indexes of item difficulty. 

If we have a large number of items whose normalized indexes of 
difficulties are known, we may construct a test having items spaced 
equally along the scale of a normally distributed ability (see exr. 26, 
p. 227). The chief drawback in the construction and use of such a 
refined instrument, however, is that proportions passing a single 
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item tend to fluctuate quite widely from sample to sample; hence, 
the scaled values of the items tend to lack stability. 

The normal curve provides only one among several methods of 
scaling test items for difficulty. The general topic of scaling the 


Fig. 5.20. Difficulties of test items in terms of 2 values which герге- 
sent amounts of a normally distributed ability sufficient to pass the 
respective items. 


items of a test or a questionnaire according to some theoretical 
model has a great many ramifications. The interested student is 
referred to Guilford (Ref. 2) and Lindquist е! al. (Ref. 5) for further 
reading. = 

Closely related to the problem of estimating the difficulty of test 
items is the problem of weighting item scores. The majority of 
educational and psychological tests are composed of a number of 
items carrying equal credit, and an individual’s score is simply the 
total number of correct responses, or some multiple of the number, 
the implicit assumptions being that the item credits are additive in 
a linear manner and that each correct item reflects the same amount 
of the ability under examination as any other correct item. When 
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the items are of unequal difficulty, as they frequently are, the latter 
assumption is clearly not tenable, and item weights varying in some 
rational manner with item difficulties would seem to be logical and 
useful. Thus, it would seem logical to weight items a, б, с, and а, 
in the illustration above, by the weights —1.64, —.39, .39, and 
1.64, or some simple derivatives from them such as 1, 2.25, 3.03, 
and 4.28, or, in round numbers, 1, 2, 3, and 4. However, аз has been 
pointed out, proportions passing an item fluctuate widely from 
sample to sample, and weights tend to have little stability in prac- 
tice. Furthermore, there is ordinarily little practical advantage to 
be gained by weighting the items in a test for difficulty. The weighted 
and unweighted scores from a test containing a large number of 
items tend to correlate almost perfectly. : 

Summary. The normal curve has a great many practical appli- 
cations in educational measurement. It is a convenient model for 
distributing categorical ratings and for transforming raw test 
scores to small, comparable numbers. It provides the simplest 
rational method available of scaling test items for difficulty. It can 
be used in transforming and combining qualitative data and in 
normalizing numerical data. Various derived scores, such as T' 
‘scores, are based upon the 2 and area relationships of the normal 
curve, 

In general, when a variable is normally distributed, normal curve 
properties can be utilized in refining gross measures of the variable. 
It should be kept constantly in mind, however, that if a variable 
is not. normally distributed, the use of the curve in refining gross 
Measures not only is unwarranted but actually introduces a source 
of error. Every assumption of normality needs to be carefully 
scrutinized. 

The normal curve has several interesting applications in corre- 
Jation work. These will be taken up in the following chapter. 


Exercises 


14. The arguments advanced in favor of the use of the normal curve as a 
model for the distribution of school marks include the points: 


8. Achievement, like scholastic aptitude, tends to be normally dis- 
tributed. 
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16. 


17. 


18. 


b. Without such a model, teachers might be either too lax or too 
stringent in their distribution of marks. 
с. The model tends to make the marks of different teachers comparable. 


Evaluate the above points. Summarize other points for and against 
the normal curve as a model for the distribution of categorical ratings. 


. An alternate scheme of distributing school marks is one based upon 


use of the normal eurve in estimating exceptionalness. Under this 
scheme, a standard score of, say, 2.00 or more would be judged ex- 
ceptional and rated 4; one of, say, —2.00 or less would be rated F; 
one falling between 1.00 and 2.00, rated B; and so on. (a) What sorts 
of measures of achievement are needed in this scheme? (b) Does the 
scheme assume normality of achievement? 

The *stanine" scoring scheme consists essentially of assigning values 
1,2,...,8,9to the proportions falling within 1/2 z intervals under 
the normal сигуе, as shown below. (a) What percentage (to nearest 
whole number) of a group would receive each stanine score? (b) What 
assumption underlies the stanine scheme? Under what condition 
should stanines not be used? 


1|2|3|4|5|6|7|8|Ў 
г Scale -4-4-4-4 407A 


Devise an eleven-point scoring scheme based upon the proportions 
falling within .4z intervals under the normal curve. 

The distribution of the 293 Arithmetic Fundamentals Test scores of 
Table I, Appendix B, is given below. The mean and c, of the distribu- 
iion are 30.13 and 8.49 respectively. By use of the proportions and 
z score relationships of Table A, (a) approximate the percentiles Poo, 
Pa, Pos, and Р and (b) estimate the percentile rank of the scores 
10, 22, 41, and 50. (c) Compare the estimated values with the exact 
values determined by the usual method of finding percentiles and 
percentile ranks. 
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SCORE f SCORE f 
51-53 1 27-29 41 
48-50 3 24-26 31 
45-47 5 21-93 19 
42-44 19 18-20 16 
39—41 18 15-17 12 
36-38 31 12-14 4 
33-35 37 9-11 3 
30-32 49 6-8 1 

3-5 3 


19. Under what conditions would percentile test norms, as estimated in 
exr. 18 above, be preferred to the actual percentiles determined from 
the given distribution? 

20. In a study of the relation of home conditions to school achievement, 
an investigator was able to classify 100 homes as follows: 


CLASS NUMBER 
A— Definitely superior 12 
B—Aboye average 31 
C—Average 30 
D—Definitely inferior 27 


Suggest two different ways of transforming the ratings into numerical 
scores, and state the assumptions underlying both. 

21. Find the median values of the segments of the normal curve demarcated 
by the proportions in exr. 20, above. 

22. The ratings of 30 student orations by 4 judges were as follows: 


RATING FIRST JUDGE SECOND JUDGE THIRD JUDGE FOURTH JUDGE 
Excellent 10 2 15 4 
Good 3 1 10 T 
Medium 5 20 5 10 
Fair 10 3 0 6 
Poor 2 4 0 3 


Suggest several methods of combining the ratings, state the assump- 
tions underlying each, and describe the advantages and disadvantages 
of each. 

23. Find the medians of the segments of the normal curve demarcated by 
the proportions in exr. 22, above. Using the median values, determine 
the composite ratings of orations rated by the judges as follows: 


ORATION FIRST JUDGE SECOND JUDGE THIRD JUDGE FOURTH JUDGE 
a Excellent Good Medium Good 
b Good Excellent Excellent Medium 


c Fair Medium Medium Fair 
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24. 


to 
e 


26. 


21. 


The distribution of the raw scores of 47 dental school applicants on 
the Picture Completion Subtest of the Wechsler-Belleyue Intelligence 
Scale is shown below. 


SCORE Ж 
15 4 
14 10 
13 11 
12 11 
11 6 
10 1 

9 2 
8 1 
7 1 

N=47 


a. Normalize the data, and find the T scores and Z scores corresponding 
to the raw scores. 

b. What are two possible advantages to be gained in normalizing the 
scores? 

c. What is the basic assumption underlying the transformation? 

d. How can the usefulness of the transformation be demonstrated? 


. Five problems have been solved by 45 per cent, 55 per cent, 65 per 


cent, 75 per cent, and 85 per cent, respectively, of a large class in alge- 
bra. If the assumption is made that the ability to solve algebra prob- 
lems is distributed normally, what are the sigma indexes of difficulty 
of the problems? 
А test constructor administered a large number of items of varying 
degrees of difficulty to 50 individuals in order to construct a 41-item 
test such that the sigma indexes of difficulty would be —2.0, —1.9, 
2. $1.9, +2.0; ie. such that the 41 items would increase uni- 
formly in difficulty throughout the interval —2.0 to +2.0 on the z 
scale. 


a. What assumptions must he make? 


b. How would he proceed ? 
c. What would be the advantages of the scaled test? The limitations? 


Summarize what seem to you to be the advantages and the limitations 
of utilizing the normal curve in educational measurement. When and 
only when can the use of the curve be defended? 
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Chapter VI 


Correlation and Regression 


IT 18 the fundamental faith of science that the world is under- 
standable. In the search for understanding of man and his affairs, 
the most frequent question that arises concerns possible relation- 
ships between phenomena. Such questions as whether parental 
income is related to child continuation in school, whether test 
intelligence is related to academic success, whether supply of eco- 
nomic goods is related to price, whether broken homes are related 
to delinquency, are inevitable and endless. The explanation and 
prediction of natural and social phenomena necessarily rest upon 
demonstrable relationships. Variables which are not related to one 
or more other variables are of little importance. The central task 
of any branch of science is that of discovering and measuring rela- 
tionships through comparisons of sets of data. As new relationships 
are found, understanding of the world is increased; when existing 
relationships permit prediction of events, control over the environ- 
ment is extended. 

We can imagine primitive man accidentally discovering, say, that 
fertilizing his plants increased their yield and that the more fertilizer 
he added, up to a point, other factors being favorable, the more the 
increase in yield. We can imagine him discovering that those among 
his fellows who learned one thing easily tended to learn other things 
easily as well. Some time in the distant past he discovered that 
certain things were associated with his well-being, others with sick- 
ness. His welfare and the progress of his society depended upon 
finding out how things were associated or related to other things. 
Knowledge of relationships has always been the key to understand- 
ing and controlling the environment. 

Statistical Correlation.(The term correlation is loosely used to 


refer to any sort of relationship between objects or events.) Such 
229 
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phrases as “the correlated curriculum,” “the correlation of subjects 
in the curriculum,” and “the correlation between fact and theory” 
all refer to relationships.(In statistics, however, the term correlation 
refers exclusively to relationships between variables which can be 
quantified. The situation in which statistical correlation is applicable 
is always one in which there is a pair of measures for each individual 
or instance in a given group. In order to apply the method of corre- 
lation to determining whether height and weight are related, we 
must have the heights and weights of numerous individuals) in 
order to study the relationship between per capita cost of instruction 
and size of school, we must have costs and sizes from numerous 
schools; in order to study the relationship between rainfall and 
crop yield, we must have rainfalls and yields in a number of in- 
ѕгапсеѕі the relationship is such that large values of one variable 
tend to be associated with large values of the other, the correlation 
is posilive; when large values of one tend to be associated with small 
values of the other, the correlation is negative. When data consist of 
pairs of measures, they are technically known as bivariale data. 


д When the two variables comprising bivariate data are correlated, 


either may be spoken of as the correlative of the other. 

Since statistical correlation provides a quantitative method by 
which relationships сап be investigated, it is a most useful tool in 
Social research. 

Correlation in the Social Sciences. In the natural sciences the 
correlations between many phenomena tend to be perfect, or 
nearly so. Thus, the expansion of mercury is so highly correlated 
with heat that, for ordinary ranges of temperature, the mercury 
thermometer serves as a reliable instrument for measuring heat. 
As temperature increases, the volume of mercury increases pro- 
portionately. 'The amount of silver deposited by an electric current 
in a unit of time is, in theory, perfectly correlated with the strength 
of the current. In the natural sciences the correlatives of a particular 
variable are usually so marked that they can be determined by 
ordinary observation or by experiment, and the correlation so nearly 
perfect that it can be stated as a law. 

In the biological and social sciences, correlations are much less 
marked and are nearly always weakened by exceptional instances, 
even when the variables can be satisfactorily measured. For ex- 
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ample, weight of men is a positive correlative of height of men, 
but the correlation is weakened by “short and heavy” and “tall and 
light” men. The correlation between amount of money in circulation 
and employment, between rainfall and crop yield, between chrono- 
logical age and mental development, between test intelligence and 
school marks, although significant, is far from perfect. 

The student may well ask, if social phenomena tend to show quite 
imperfect. relationships, why correlation is used in social research. 
If the alternative to statistical correlation were controlled experi- 
mentation, the investigator would be indeed foolish to employ the 
former. But the alternative in many situations is guessing or the 
intuitive process of analysis known as judgment. Professor Fisher 
has observed (Ref. 1, pp. 175-176): 


No quantity has been more characteristic of biometrical work than 
the correlation coefficient, and no method has been applied to such 
various data as the method of correlation. Observational data in par- 
ticular, in cases where we can observe the occurrence of various possible 
contributory causes of a phenomenon, but cannot control them, has been 
given by its means an altogether new importance. ... 

One of the earliest and most striking successes of the method of cor- 
relation was in the biometrical study of inheritance. At a time when 
nothing was known of the mechanism of inheritance, or of the structure 
of the germinal material, it was possible by this method to demonstrate 
the existence of inheritance, and to “measure its intensity "; and this in 
an organism in which experimental breeding could not be practiced, 
namely, Man. By comparison of the results obtained from the physical 
measurements in man with those obtained from other organisms, it was 
established that man's nature is not less governed by heredity than that 
of the rest of the animate world. The scope of the analogy was further 
widened by demonstrating that correlation coefficients of the same mag- 
nitude were obtained for the mental and moral qualities in man as for 
physical measurements. 

These results are still of fundamental importance, for not only is in- 
heritance in man still incapable of experimental study, and existing 
methods of mental testing are still unable to analyze the mental disposi- 
tion, but even with organisms suitable for experiment and measurement, 
it is only in the most favorable cases that the several factors causing 
fluctuating variability can be resolved, and their effects studied, by 


Mendelian methods.* 

* Reprinted from В. A. Fisher, Statistical Methods for Research Workers; 
published 1950 by Oliver and Boyd, Ltd., Edinburgh, by permission of the 
author and publishers. 
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In correlation studies in the biological and social sciences, we 
ordinarily must observe many instances or cases of the related 
phenomena and then take a sort of average measure of the relation- 
ship present, the more instances, of course, the more reliable the 
measure. (Later we will see that the reliability of a measure of 
correlation is inversely proportional to the square root of the number 
of cases; that, as always, the larger the sample, other things being 
equal, the more confidence we can place in the information it pro- 
vides.) Statistical correlation refers to the average amount of rela- 
tionship between two variables determined by investigation of a 
number of instances or cases of the relationship. 

Most variables in sociai research are enormously complex, in the 
sense of having numerous correlatives. It is generally the case that 
the greater the number of variables which are associated with a given 
variable, the less the latter tends to be correlated with a single one. 
When this is the case, there is little possibility of exerting sufficient 
controls to permit experimental study of the relationship between 
two variables. At best, only the general tendencies can be observed. 


Exercises 


1. Can you think of any facts or variables unrelated to others. Is informa- 
tion about these worth anything at present? 
‚ Can you think of any relationships between variables which are not 
useful? 
3. Suppose that a research worker wished to determine whether conserva- 
tive attitudes were related to chronological age. How would he proceed? 
4. From your own observation what are some variables related to intelli- 
gence? Low income? Continuation in school? Scholastic achievement? 
5. Consider the statement, “So many accepted relationships have been 
proved false that the study of relationships is futile in social science.” 
6. What would you expect to be the nature of the correlation in each of 
the following? State other variables which may affect the relationship 
in each, 


to 


. Achievement in reading and achievement in spelling. 

. Life of automobile tires and speed of driving. 

- Salaries and years of experience in a given school system. 
Amount of unemployment and retail sales, 

- Size of head and intelligence. 

. Husbands’ and wives’ heights. 

Speed of sound and temperature. 


я „ббс op 
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h. Amount of sunshine and crop yield. 
i. Size of vocabulary and ability to learn. 
j. Hours of study and number of statistics exercises worked. 


. Name several pairs of variables, not listed above, which you believe 
would be (a) positively related; (b) unrelated; (c) negatively related. 


та 


Тһе Product-Moment Coefficient of Correlation 


The most widely used and best measure of correlation is the 
product-moment coefficient, developed by the English statistician, 
Karl Pearson, about 1900. The nature of the coefficient can be 
brought out clearly as we consider a practical problem. 


TABLE 6.1 
SCORES OF 18 EIGHTH-GRADE PUPILS ON TESTS OF 
READING COMPREHENSION AND VOCABULARY 
(Data from school B, table I, appendix B) 
oe 
E VOCABULARY PRODUCT OF 


PUPIL X Y PAIR ХУ x? ү? 
026 57 52 2,964 3,249 2,704 
029 48 43 2,064 2,304 1,849 
027 48 32 1,536 2,304 1,024 
032 47 40 1,880 2,209 1,600 
024 40 44 1,760 1,600 1,936 
028 39 42 1,638 1,521 1,764 
031 39 32 1,248 1,521 1,024 
037 39 30 1,170 1,521 900, 
025 37 30 1,110 1,369 900 
038 35 24 840 1,225 576 
040 34 26 884 1,156 676 
033 32 27 864 1,024 729 
036 30 30 900 900 900 
035 30 23 690 900 529 
039 28 22 616 784 484 
030 26 19 494 676 361 
023 24 34 816 576 1,156 
034 24 26 624 576 676 

SUM 657 576 22,098 25,415 19,788 
MEAN 36.5 32.0 


таны те ныш Е а 

Suppose we wish to determine the amount of relationship between 
reading and vocabulary іп, say, the eighth grade and suppose that 
we have pairs of scores for a group of pupils, as shown in Table 6.1. 
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Inspection of the table indicates that the pupils who have high 
scores in the reading test tend to have high scores in the vocabulary 
test. This tendency stands out clearly when the 18 pairs of scores 
are plotted as dots in the scatter diagram of Fig. 6.1. For the most 
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Fig. 6.1. Graphical presentation of paired raw 
scores. (From Table 6.1.) 


part, scores above the mean in one test are paired with scores above 
the mean in the other, and scores below the mean in one with scores 
below the mean in the other. Moreover, the dots tend to fall in a 
linear pattern extending from the lower left corner to the upper 
right corner of the figure. (If they fell in a straight diagonal line, of 
course, perfect correlation would be indicated.) 

Іп order to describe the amount of correlation characterizing the 
data, we need a measure which is sensitive to the extent to which 
high scores in reading are paired with high scores in vocabulary, 
intermediate with intermediate, and low with low; i.e., the extent 
to which the two series of scores vary together. Let us see how such 
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a measure can be derived from the sum of products of pairs of 
deviation scores. 

The Sum of Products of Deviation Scores as a Measure of 
Correlation. The raw scores of Table 6.1 are shown as deviation 


TABLE 6.2 
DEVIATION SCORES OF 18 EIGHTH-GRADE PUPILS ON 
TESTS OF READING COMPREHENSION AND VOCABULARY 
(Data from table 6.1) 


READING VOCABULARY PRODUCT SQUARE 

PUPIL т y ту 1? у 

026 .5 +20.0 +410.0 420.25 400.00 
029 .5 +11.0 +126.5 132.25 121.00 
027 5 0.0 0.0 132.25 0.00 
032 .5 + 8.0 + 84.0 110.25 64.00 
024 5 +12.0 + 42.0 12.25 144.00 
028 .5 +10.0 + 25.0 6.25 100.00 
031 :5 0.0 0.0 6.25 0.00 
037 .5 — 2.0 - 5.0 6.25 4.00 
025 5 — 2.0 - 1.0 .25 4.00 
038 5 — 8.0 + 12.0 2.25 64.00 
040 .5 - 6.0 + 15.0 6.25 36.00 
033 .5 - 5.0 + 22.5 20.25 25.00 
036 .5 — 2.0 + 13.0 42.25 4.00 
035 25 - 9.0 + 58.5 42.25 81.00 
039 5 —10.0 + 85.0 72.25 100.00 
030 ‚5 —13.0 +136.5 110.25 169.00 
023 5 + 2.0 — 25.0 156.25 4.00 
034 E — 6.0 + 75.0 156.25 36.00 
SUM 0 0 +1,074.0 1,434.5 1,356.0 


“масы е————= 


scores in Table 6.2. It will be noted that in each case the mean has 
been subtracted from the score, as is always done in converting raw 
scores to deviation scores. The products of the pairs of devi- 
ation scores with their algebraic signs are shown in column 4 of 
the table. 

Now the sum of the products, shown at the foot of the table, is a 
sensitive measure of the extent to which reading scores аге asso- 
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ciated with vocabulary scores of proportional magnitude. If the 
association were stronger, the sum would be increased; if weaker, 
the sum would be decreased. For example, if the association were 
made stronger by interchanging the vocabulary scores of pupils 027 
and 024, the sum of products would be 1,170.0 instead of 1,074.0. 
On the other hand, if the association were made weaker by inter- 
changing the vocabulary scores of pupils 025 and 039, the sum of 
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Fig. 6.2, Graphical presentation of paired devia- 
tion scores. (From Table 6.2.) 


produets would be 1,002.0 instead of 1,074.0. The student should 
experiment with other favorable and unfavorable interchanges until 
he is satisfied that the sum of products of deviation scores is sensi- 
tive to the amount of association or correlation between reading 
and vocabulary scores, 

In addition to being sensitive to the amount of. correlation, the 
sum of products of deviation scores indicates whether the correla- 
tion is positive or negative. This is clearly seen when we examine 
the deviation scores plotted in the scatter diagram of Е ig. 6.2. The 
signs of the products of pairs of deviation scores are positive in 
quadrants I and III and negative in quadrants П and IV. The 
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majority of dots lie in the former two quadrants and the sum of 
products is positive. If the majority of dots had fallen in quadrants 
II and IV, the sum of products would have been negative. This 
would have been the case if reading scores had been inversely asso- 
ciated with vocabulary scores. (If the dots had been distributed 
evenly over the four quadrants, the sum of products would have 
been zero, indicating absence of correlation.) Thus, the sum of 
products reflects both the amount and the direction of correlation. 
Although we have developed this idea with reference to a particular 
set of data, the argument may readily be extended to any set of 
quantitative bivariate data. 

Although sensitive to the amount and direction of correlation, 
the sum of products of deviation scores is limited as a measure of 
correlation because it is independent neither of the size of the group 
nor of the units of measurement. If there had been more than 
18 pupils in the illustrative problem, the sum of products would 
have been affected, whether or not the additional pairs of scores 
contributed to the amount of correlation. Furthermore, the sum 
of products would have been affected if the scores on the tests had 
been systematically larger or smaller or if the unit of measurement 
had been different. If, for example, each item in the vocabulary 
test had counted 1/2 instead of 1, the sum would have been 537.0 
instead of 1,074.0. If the reading test had been twice as long and 
if the pupils had done proportionally as well on the longer test, 
the sum would have been doubled. As a more general illustration, 
if the heights and weights of a group of men were recorded in the 
metric system, the sum of products of deviation heights and weights 
would be quite different from that which would be obtained if 
heights and weights were recorded in the English system. 

Before the sum of products of deviation scores can be used as a 
general measure of correlation, it is necessary to introduce refine- 
ments which will eliminate the effects of size of sample and unit of 
measurement. These refinements are discussed in the following 
paragraphs. 

The Mean Product of Standard Scores as a Measure of 
Correlation. The effect of sample size on the sum of products of 
deviation scores ®ту can easily be eliminated by dividing by the 
number № in the sample (the number of pairs of scores.) The quo- 
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tient EZzy/N, being the mean of the products, is independent of 
size of sample. 

If we can now free the quotient Zzy/N from the effect of the 
units of measurement, we shall arrive at a perfectly general measure 
or coefficient of correlation. In an earlier chapter we learned that a 
standard score is independent of the unit in which the original 
measurement is made. Since this is true, it remains only to divide 
Day/N by о. and e, to arrive at a perfectly general measure of 
-correlation. In other words, а measure or coefficient of correlation 
between two variables, independent of the size of the sample and the 
unils of measurement, can be determined by dividing the mean product 
of the paired deviation scores by the standard deviations of the scores. 
It will be seen that this procedure is equivalent to finding the mean 
product of paired standard scores, although ordinarily the standard 
scores are not actually computed. 

Let us illustrate the procedure by determining the coefficient of 
correlation of the data of Table 6.2. The mean of the products of 
paired deviation scores is 1,074/18 or 59.67. The standard deviations 
are \/1,434.5/18 and ~/1,356/18 or 8.93 and 8.68. When we per- 
form the successive divisions we obtain .77. Theoretically, .77 is 
the coefficient of correlation we would obtain between reading 
comprehension and vocabulary, no matter how many comparable 
eighth-grade pupils we measured with the given tests and no matter 
whether we scored correct responses 1 or 2 or 1/2 and so on. Prac- 
tically, of course, the unreliability of the tests and sampling fluctu- 
ations would affect the coefficient. 

Computation of the Product-Moment Coefficient of Cor- 
relation. We may summarize the above procedure in the formula 


22; 
fa = Neca” (6.1) 


This is the basic formula for the Pearson product-moment coefficient 
of correlation, commonly designated by Tay, Since oz = A/Xzi/N 
and о, = 4/Zy?/N, the basic formula may be written 


E 00у ine p. 
Pw 7 FG) rA 


Using the latter formula in our example (see Table 6.2 for sums), 


% 
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we have 
1,074.0 


"= — /(1,434.5)(1,356.0) | 


It is usually the case in practical correlation problems that the 
means of the scores turn out to be decimal numbers. When this is 
the case, the deviation score formulas (6.1) and (6.2) will involve 
rather tedious arithmetic. If we substitute X — X for zand Y — Y 
for y in formula (6.2), expand, substitute ZX/N and ZY/N in 


terms containing X and Y, and simplify, we obtain the raw score к 
formula 


ы NZzXY — (2X)(2Y) Г 
МХ? — (ХХ) ҮҮ? — СУ 
Let us apply formula (6.3) to the raw scores of Table 6.1. 'The 
values for substitution are: 
NZXY = 18 x 22,008 = 397,764, 
®Х)(®Ү) = 657 X 576 = 318,432, 
МУХ? = 18 x 25,415 = 457,470, 


(6.3) 


(ХХ)? = (657)? = 431,649, 
NZY? = 18 X 19,788 = 356,184, 
(ZY)? = (576) = 331,776. 


Substituting in the formula we have 
US 397,764 — 318,432 =n 
^" STATO — 431,649)(356,184 — 331,776) ` 

In summary, in computing гу from deviation scores 2 and y, the 
data are organized and treated as shown in Table 6.2 and the appro- 
priate substitutions are made in either formula (6.1) or (6.2). In 
computing re from raw scores X and Y, the data are organized 
and treated as shown in Table 6.1 and substitutions are made in 
formula (6.3). 

The student is urged not to follow the computational procedures 
blindly, but to keep in mind the purpose of the procedure, namely, 
that of determining a sensitive measure of the relationship between 
two variables, a measure based upon the products of pairs of scores. 

Accuracy of ra. The various formulas above indicate that the 
product-moment coefficient of correlation is a complex function of 
the scores. For this reason there is no generally satisfactory rule for 
determining the number of decimal places to retain in a computed 
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value of r,,. It is rather common practice, however, to retain two 
decimal places in г„,, and that practice is recommended here. In а 
large sample an additional place may be justified, but this ordinarily 
is unimportant in practical work. 


Exercises 


8. Write the deviation scores close to the corresponding dots in Fig. 6.2, 
following the algebraic conyention of writing the abscissa values first. 
By this convention the dot in quadrant II would be labeled — 12.5, +2, 
i.e., the dot represents deviation scores of т = —12.5 and y = +2. 
In which quadrant are both т and y positive? Negative? т positive, 
y negative? y positive, z negative? 

9. Suppose you have 8 pairs of scores. Indicate by graphs (similar to 
that of Fig. 6.1 or 6.2) the pattern the scores would make if they 
were characterized by (a) perfect positive correlation; (b) imperfect 
positive correlation; (c) zero correlation; (d) imperfect negative cor- 
relation; (e) perfect negative correlation. 

10. Perfect positive correlation is present when the members of each pair 
of standard scores are equal. Show that for perfect positive correlation 
теу = +1.00. 

11. Perfect negative correlation is present when the members of each pair 
of standard scores have the same absolute value, but differ in sign. 
Show that for perfect negative correlation rz, = — 1.00. 

12. Construct a general proof or give a numerical example to show tkat 
decreasing all of the X scores or all of the Y scores by a constant does 
not affect the value of rsy. 

13. Find the coefficient ‘of correlation between reading and vocabulary 
in one of the schools of Table I, Appendix B. (If different members 
of the class select different schools, the various coefficients can be 
compared.) 

14. It is frequently said that, unless pupils can read well, they do badly 
in other school subjects. How would you find out in which subjects 
this state of affairs tends to be most marked? What are the limitations 
to generalizations or conclusions you might make? 

15. The head of a college foreign language department once said that high 
school language grades were the best single predictor of college suc- 
cess. What evidence bearing upon this assertion can you obtain from 
the data in Table II, Appendix В? How would you have to qualify any 
conclusion you might make? 

16. Suggest other correlational studies of the data in Tables I and II, 
Appendix B, which might be made. State the limitations of each. 
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Meaning of Correlation 


Ina later section we shall deal with several technical points which 
arise in interpreting a coefficient of correlation. At this time let us 
consider the meaning of correlation from a nontechnical point of 
view. 

Although it is generally rather difficult to compute, there is 

nothing mysterious about the product-moment coefficient of corre- 
lation{ Tt is merely a sensitive measure of the amount of association 
or relationship between two sets of scores, ie., between two vari- 
ables. Since it is independent of the number in the sample and of 
the unit of measurement, it is an abstract measure. A coefficient 
of +1.00 indicates perfect positive correlation; one of — 1.00, perfect 
negative correlation. Perfect correlations, even in the exact sciences, 
exist only theoretically. 
(Тһе larger the absolute value of the coefficient the more marked 
the relationship between the variables) The coefficient is a complex 
function of the degree of relationship, however, and two coefficients 
cannot be directly compared.(A coefficient of .80, for example, indi- 
cates more than twice as much relationship than one of .40 Later 
we shall see that the strength of association increases dispropor- 
tionately as rz, approaches 1.00 or —1.00. A negative coefficient 
indicates the same amount of relationship as а positive coefficient 
of the same magnitude; however, the former indicates inverse rela- 
tionship whereas the latter indicates direct relationship. 

Correlation and Causation. The fact that correlation may de- 
pend upon the extent to which one variable is affected by another 
squarely brings up the question of causation, a question which 
always arises when relationships are observed and one which per- 
meates all science. 

The principle of causality is involved and elusive: we shall make 
no attempt to treat it adequately here. But some of its simpler im- 
plications need to be considered before the student can appreciate 
the power of correlational methods in research. 

When we use the term cause we ordinarily are referring to a suffi- 
cient reason for the occurrence of an event, and hence are thinking 
about an orderly and invariable sequence, effect preceded by cause. 
The ordinary interpretation of causal relationship is something like 
this: с and е are related as cause and effect if e occurs after с and 
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if е does not occur when с is absent. Hence, when we seek causal 
relationships we are really seeking an invariable sequence of events. 
Such relationships are the sine qua non of understanding and con- 
trolling the environment, since they enable us to explain and predict 
events. 

The fact of correlation does not demonstrate sequence, and there- 
fore does not indicate which of two related variables is cause, which 
effect, When variable X is correlated with Y, and the correlation 
is not accidental, there are three reasonable interpretations: (1) X 
is the cause or part of the cause of Y, (2) Y is the cause or part of 
the cause of X, and (3) X and Y are caused or partially caused by 
some third variable or set of variables. Correlation does not indicate 
which one of the three interpretations is sound in a given situation; 
it demonstrates only that X and Y are associaled. Inferences regard- 
ing the direction and nature of causation can be made, if at all, only 
from information supplementary to the fact of correlation. 

In spite of this limitation, however, correlation is extremely useful 
in preliminary investigation of causal relationships. It is generally 
the case that variables which are causally related show correlation 
and that variables which do not show correlation are not related 
causally. Hence, the method of correlation serves both to single 
out variables which may be relevant to an observed effect and to 
eliminate variables which are irrelevant. 

In later sections we shall find that correlation is useful in predic- 
Чоп, whether or not causation can be demonstrated. 

Assumptions Underlying Correlation. Thus far nothing has 
been said about the major assumption underlying the use of the 
correlation coefficient rz, as a measure of relationship between two 
variables. This is the assumption of "linearity," which merely 
,means that the data, when plotted, tend to follow a straight line 
as closely as, say, a U-shaped or J-shaped curve. If the data tend to 
follow some curve other than a straight line, ге, underestimates the 
amount of relationship. We shall return to the assumption of 
linearity in connection with regression. 

The condition of linearity is important for another reason. The 
products of the deviation scores z and y are summed in computing 
Тәу» 1.е., the products are treated as additive. This implies that the 
original X units are comparable throughout the X-scale and the У 
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units throughout the Y-scale. When test data are nonlinear the 
comparability of the units is particularly questionable. 

While no assumption relating to the form of the distributions of 
the correlated variables underlies the product-moment method, 
most of the uses to which an rz, is put do presuppose normality of 
the distributions, and the student is cautioned that an rz, com- 
puted for other than normally distributed data will have little 
usefulness except as a rough descriptive measure. 


Exercises 


17. Consider again the variables in exr. 6, page 232. Comment on possible 
causal relationships between or underlying each pair of variables. 

18. Suppose the question came up whether humidity of atmosphere is 
related causally to suicide rates. Could anything be proved or disproved 
by correlational methods? 

19. Consider several pairs of possibly correlated variables in Tables I and 
II, Appendix B. Comment on possible causal relationships. 

20. The figures below are based upon data reported by World Almanac, 
1950, 1954. Would rsy fairly measure the relationship between public 
school enrollment and per capita expenditure? Explain. 


PUBLIC SCHOOL ENROLLMENT PER CAPITA EXPENDITURE 
YEAR IN MILLIONS IN DOLLARS 
1880 9.87 1.91 
1890 12.72 11.04 
1900 15.50 13.87 
1910 17.81 23.93 
1920 21.58 48.02 
1930 25.68 90.22 
1940 25.43 92.16 
1950 25.11 232.47 


Special Applications of Product-Moment Correlation 


'The product-moment method of correlation, although used prin- | 
cipally in the case of continuous measures, is, under certain con- 
ditions, applicable to both discrete quantitative data and to quali- 


2 data. 
ank Difference Correlation. It frequently happens іп sta- 
tistical work that the variables whose relationships we wish to 
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investigate are available only in order of merit, importance, or some 
other quality. In other situations it may be desirable to assign 
ranks to individuals who have been originally measured on two 
continuous variables, recording ranks instead of actual scores to 
indicate performance. 

Іп all such cases, it is conventional to assign the number 1 to the 
first, best, or highest in each series, 2 to the second, and so on. If 
there is correlation between the two variables, of course, the ranks 
will tend to correspond. 

Consider the data in Table 6.3 which are the ratings of 8 debaters 
on clarity of argument and fluency of speech. We may treat the 


TABLE 6.3 


rammes OF 8 DEBATERS ON CLARITY OF ARGUMENT 
AND FLUENCY OF SPEECH 


CLARITY FLUENCY 
DEBATER X Y XY Жа 12. D р? 
А 1 3 3 1 9 —2 р 
В 2 1 2 4 1 1 1 
2 3 6 18 9 36 —3 9 
р 4 4 16 16 16 0 0 
Е 5 2 10 25 4 3 9 
F 6 8 48 36 64 -2 4 
G 7 Tj 49 49 49 0 0 
H 8 5 40. 64 25 3 9 
SUM 36 36 186 204 204 0 36 


ranks as raw scores and employ formula (6.3) to find the correlation 
between clarity of argument and fluency of speech as judged. Thus, 


(8 X 186) — (36 х 36) 


A 
” [B X 204 — (368 x 204 — (3671 
192 
= 356 = 57. 


A simple formula is available for the computation of the product- 
moment coefficient of correlation where the data are in rank order: 
62D? 


п = — NW? = 1) Т” (6.4) 
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in which гу is used to indicate rz, from ranked data, D is the differ- 
ence between ranks, and А is, as usual, the number of pairs of 
scores or ranks. A derivation of the formula may be found in Ref. 6. 

The last columns of Table 6.3 include the values of D and D? for 
the illustrative problem. Substituting in the formula, 


When Х and Y denote the ranks of a group of individuals on two 
variables and when there are no ties for position, га gives the Pearson 
product-moment coefficient of correlation of the ranked data. (This 
coefficient frequently is designated by the Greek letter p, rho.) 

In practical situations, ties in ranks frequently occur, particularly 
when ranks are imposed on variables which have been measured 
originally by some other method. Consider the data in Table 6.4 
which are the order of finishing a 50-item statistics test and the 
scores on the test for 15 students. When we attempt to assign ranks 
to the scores, we encounter difficulty. Students ranking sixth and 
tenth in order of finishing tie for ranks 5 and 6 on the scores; and 
students 3, 7, and 11 in order of finishing tie for ranks 7, 8, and 9. 
In case of ties it is conventional to assign the mean of the ranks 
tied for to the ties. The mean of 5 and 6 is 5.5 and this is the rank 
assigned to each of the scores of 36. The mean of 7, 8, and 9 is 8, 
so 8 is assigned to each of the three scores of 35. The computation 
of га for the data in Table 6.4 is left as an exercise for the student. 

'The rank difference method of correlation is of wide usefulness. 
In addition to providing a method of determining the precise amount 
of relationship between two variables for which we have only ranks, 
it provides a convenient and rapid method of estimating rz, in the 
continuous measures situation. When we have fewer than about 
30 pairs of measures, it is usually less laborious to assign ranks to 
the measures and compute rg rather than rzy. When the measures 
are fairly well distributed over their ranges with relatively few ties, 
ra gives a close approximation to rzy. Horn (Ref. 3) provides a cor- 
rection for the effect of tied ranks on the value of ra. 

Product-Moment Biserial Correlation. There are numerous 
situations in which one of the two variables whose relationship is 
of concern can be observed only in two amounts or categories. For 
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TABLE 6.4 


ON A 50-ITEM STATISTICS TEST 
————————————————— 


RANK 
ORDER OF RANK ORDER DIFFERENCE 
FINISHING SCORES OF SCORES D D? 
1 45 2 
2 21 14 
78 85 8 
4 38 4 
5 40 3 
6 36 5.5 
Т 35 8 
8 34 10 
9 18 15 
10 36 5.5 
11 35 8 
12 30 11 
13 28 12 
14 48 1 
15 24 13 


example, suppose we are interested in the relationship between 
test intelligence and survival in school of college freshmen during a 
particular year. We can, of course, measure test intelligence in the 
usual manner, but survival in school is perhaps most defensibly 
observed in the two categories, “drop-outs” and *stay-ins." Simi- 
larly, we might be interested in determining whether there is asso- 
ciation between "taking books home" and achievement in school, 
"reading comic books” and test intelligence, or “passing а par- 
ticular item on a test" and performance on the whole test, 

There are many such problems in the social Sciences, and, al- 
though there generally are better statistical methods than biserial 
correlation for dealing with them, it is sometimes desirable to have 
а measure of relationship between the variables involved. 

Let us see how we can obtain a measure of relationship between a 
dichotomous (two-division) and a continuous variable. The data 
in Table 6.5 were obtained by giving a 30-item reasoning test to 
12 graduate students enrolled in a course in research methods, Four 
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of the students had studied formal logic as undergraduates, No 
better measure of the study of formal logic was available than “had 
studied” and "had not studied." In Table 6.5, “һай studied” is 
assigned a numerical value of 1, and “had not studied,” a value of 0. 


TABLE 6.5 
SCORES ON A REASONING TEST OF 12 GRADUATE 
STUDENTS, 4 OF WHOM HAD STUDIED LOGIC 


SK aaaħŮĖ— 


STUDY OF REASONING 
Locic X TEST У XY Ха TA 
1 27 27 1 729 
0 25 0 0 625 
1 22 22 1 484 
0 20 0 0 400 
0 20 0 0 400 
0 18 0 0 324 
1 18 18 1 324 
0 18 0 0 324 
0 15 0 0 225. 
1 12 12 1 144. 
0 10 0 0 100 
0 10 0 0 100 
sum 4 215 79 4 4,179 


Let us now compute the usual product-moment r+ for the data in 
Table 6.5. When we substitute the sums in the last row of the table, 
remembering that / = 12, in formula (6.3) we һауе 


Р: (12 X 79) — (215 x 4) Я 
2 X 4— (412 x 4,179 = (215)?] 
so that гь = .25. 


There is a simple formula available for the product-moment 
correlation coefficient in the biserial situation 


и 2 V, (6.5) 


in which У, is the mean Y score of the individuals in the upper, 
positive, or “1” category, Y, is the mean Y score of the individuals. 


248 Statistics т Education 


in the other category; p is the proportion of individuals in the upper, 
positive, or “1” category; q the proportion of individuals in the 
other category, so that p + q = 1; and c, is the standard deviation 
of the continuous measures Y. A derivation of the formula is in- 
cluded in Appendix A. 

For the above problem, 


> 27 + 22 + 18 + 12 


y, = 20025 78 = 1975, 
о TM 
dal кде 


су = E M12 X 4,179 — (215)? = 5.22, 


(19.75 — 17.00) V1/3 X 2/3 _ 25 
5.22 macs 


so that гь = 


This product-moment coefficient of biserial correlation is cus- 
tomarily called point biserial and is indicated by rj; to distinguish 
it from another biserial coefficient, rẹ which is based upon the 
assumption that the dichotomous variable (in the above illustrative 
problem, the study of logic) is normally distributed. This is our next 
topic. 

Normalized Biserial Correlation. In the biserial situation, if 
the assumption is made that the dichotomous variable is normally 
distributed, the properties of the normal curve may be utilized to 
derive the following formula, a proof of which is given in Ref. 6, 
p. 362: 


(Ж Е Yo)pq. 


ть = ; 
Уо, 


(6.6) 


S су, p and 4 are defined as in formula (6.5) above, and y! is 
the ordinate of the normal curve which separates the proportions 
p and q. 

То illustrate the use of (6.6) we return to the data of Table 6.5. 
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The уашез for substitution in the formula are, as determined above, 


у = 19:75, Y; = 17.00, 
1 

р =; = 3335, q = .6667, 

в, = 5.22, у’ = .3631, 


the value of y’ being found by entering the table of normal areas, 
p. 557, at .1664 (as close as we can come to .1667, the difference 
between .6667 and .5000) and taking the corresponding z value 
-43 into the table of ordinates. Substituting in (6.6), 


(19.75 — 17.00) (1/3 X 2/3) _ 
ш а 5.22 X 3637 


.32. 


Before leaving the example let us note that ordinarily we would not 
compute a biserial coefficient of correlation for so small a group. 
Both rp and гь are relatively much affected by sampling fluctuations 
and consequently are not dependable in small samples. 

Either formula (6.5) or (6.6) may be used, of course, in biserial 
correlation, but with somewhat different results. The value of r, 
generally exceeds that of r». The question of which gives the better 
results in a given problem is a difficult one. In general, if the assump- 
tion is sound that normality underlies the dichotomous variable, 
гь theoretically is the appropriate coefficient. While not a product- 
moment coefficient of biserial data, r, presumably gives a better 
estimate of the value of rz, which would be observed if continuous 
and normal measures of dichotomous variables were available. It 
can be argued, however, that in applied statistics the real problem 
is that of devising continuous and normal measures of dichotomous 
variables, rather than computing coefficients of relationship which 
might be observed if these existed. 

In practical work гь has one important advantage over rp. There 
are tables available which make it possible to estimate гь quickly 
and with sufficient accuracy for many practical purposes. We will 
use such a table later in connection with test item analysis, p. 364. 

The most common application of biserial correlation is in test 
item analysis, in which the dichotomous variable is “ pass-fail " on a 
particular item and the continuous variable the scores on the whole 
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test. However, biserial correlation has various other uses, as noted 
at the beginning of the section. 


Exercises 


21. Compute гу for the data you used in ехг. 13 and compare with г.у. 

22. Suppose you were the teacher of the eighth-grade pupils whose ages 
and test scores are recorded in Table I, Appendix B, School A. How 
could you make use of rank difference correlation? 

23. In the illustrative problem, what would be the effect on rj, or on ry 
of assigning “0” to “had studied logic" and “1” to “had not studied 
logic”? Of assigning “1” to “had not studied logic" and “2” to “had 
studied logic"? Can you make a general statement of the effect оп 
the biserial coefficients of the assignment of values to the dichotomous 
variable? 

24, The scores of 50 students on a whole test and their scores on item i 
of the test are shown in the table below. The table tells us that 30 
responded correctly to the item and 20 incorrectly. Find rj; and m and 
interpret the results. 

(Hint: Find Y; from the distribution under “1%; Y, from the distri- 
bution under “0”; and о, from the combined distribution.) 


SCORE ON ITEM i 
WHOLE TEST 0 1 


30 
29 
28 
27 
26 
25 
24 
23 
22 
21 
20 
19 
18 


SUM 


к к ы Os 9» COO IS 


о мю с лоны 


25. The data below were obtained by giving 6 Wechsler-Bellevue test 
items to 20 college men. Each response was scored correct or incorrect 


"C———————————— ——wÀ 
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and was timed to the nearest second. What is the correlation between 
speed and accuracy? Does the coefficient support the conclusion that 
it tends to take longer to do an item incorrectly than to do it correctly? 


TIME IN | CORRECT INCORRECT 
SECONDS | RESPONSE RESPONSE 
5-9 6 
10-14 23 5 
14 6 
12 8 
5 7 
4 5 
2 5 
3 2 
3 4 
1 
4 
1 


The Computation of т. for Grouped Data 


When a correlation problem involves more than about 30 cases, 
it ordinarily is possible to effect a saving of time without serious 
loss of accuracy in finding the product-moment coefficient of corre- 
lation by grouping the scores in a bivariate frequency distribution or 
correlalion table. Such a table makes possible the extension of the 
correlation methods already discussed to grouped data. 

Consider the mental ages and the vocabulary scores of the pupils 
in Schools A, B, and C, Table 1, Appendix B. In all there are 79 
pupils in the three schools. A great deal of labor would be necessary 
in finding the product-moment coefficient of correlation, unless one 
had access to a calculating machine. If the group were much larger, 
even machine computation would be extremely laborious. 

The range of mental ages of the 79 pupils is 199 — 116 or 83, and 
this suggests a grouping scheme having a class interval of 7 and 
112-118 for the lowest class. The vocabulary scores range from 
16 to 52, and this suggests an interval of 3 and a lowest class of 
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15-17. After the classes are labeled as shown in Table 6.6, tallies to 
represent pairs of scores are entered in the cells of the table. Pupil 
000 has a mental age of 134 and a vocabulary score of 22, so a tally 
is entered in the cell where column 133-139 intersects row 21-23; 
pupil 001 has а mental age of 180 and a vocabulary score of 46, so a 
tally is entered in the cell where column 175-181 intersects row 
45-41, and so on. 

The entries in columns 1, 2, 3, and 4 at the right of Table 6.6 and 
in rows a, b, c, and d at the foot are obtained exactly as in com- 
puting the standard deviation for grouped and coded data, the 
subscripts being needed because we are working with two variables, 
mental age А and vocabulary Y. 

The entries in columns 5 and 6 and rows e and f require explana- 
lion. Consider the four tallies in the 48-50 row in the table. In 
coded scores, 4, and d,, these four tallies represent (7,11), (9,11), 
(12,11), and (12,11). The sum of the products of the pairs is 77 + 
99 + 132 + 132 or 440. Since 11 is the common d, value, the sum 
may be expressed as 11(7 + 9 + 12 + 12). The entry in column 5 
opposite row 48—50 is the sum of the 4. values (7 + 9 + 12 + 12) 
of the tallies in the row. The entry in the 4,У4, column is the product 
of this sum and the common d, value 11. The d, sums are used 
merely to provide a short cut in finding the products of pairs of 
coded scores. 

The entries in rows e and f are similarly explained. Consider the 
two tallies in the 119-125 column. The coded values of these tallies 
are (1,2) and (1,3), respectively, and the sum of the products of the 
two pairs is 1(2 + 3) or 5. The student should verify the other 
entries in columns 5 and 6 and rows e and f. In computing the sums 
for column 5 and row e, a runner graduated in coded units, as shown 
below, will be found helpful. 


ПУ ӘК 407506" 275945005911 112 


When data are grouped and coded, formula (6.3) becomes 
NZ(d,Zd.) — (Zf.d.)(Zfud;) ) 

МЈ — (Ға) ММД — (fd, 

Let us apply the formula to the data of Table 6.6. The various 


(6.7) 


Tzy 
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quantities to be substituted in the formula are: 


Мхда,ха.) = 79 x 2,777 = 219,383, 
(х/.4(5/,4)) = 410 x 473 = 193,930, 
N3f,d? = 19 x 2,614 = 206,506, 


(Zf.d.? = (410)? = 168,100, 
N3f,d2 = 79 x 3,483 = 275,157, 
(Х/,а,)% = (473)? = 223,729. 


Substituting, we obtain 


4 219,383 — 193,930 
V/ 206,506 — 168,100 \/275,157 — 223,729 


Try 


Tf the means and standard deviations of the distributions in the 
correlation table are desired, they can be readily computed. By 
formula (3.5), p. 95, the means of the mental ages and the vocabu- 
lary scores in Table 6.6 are, respectively, 


X-1547 (2) = 115 + 36.33 = 151.33, 
Ve INS (%) = 16 + 17.96 = 33.96, 


and by formula (4.10), р. 148, the standard deviations are, respec- 


tively, 
с. = 7/79 \/ 206,506 — 168,100 = 17.36, 
с, = 3/79 \/ 275,157 — 223,729 = 8.61. 


The student will recall that neither the mean nor standard devi- 
ation for grouped data is affected by the position of the arbitrary 
origin. This is also true for the correlation coefficient. When the 
origins are taken near the center of the two distributions in the 
correlation table, smaller products and sums result than if they 
are taken in the lowest classes, but the saving in labor is impaired 
to some extent by the inconvenience of working with both positive 
and negative numbers instead of positive only. 

There are a great many commercial blank correlation tables, 
some of which provide elaborate checks of the correctness of the 


1 
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computations. The majority of these are so involved that they 
tend to make the beginning student a blind follower of directions. 
The setup shown in Table 6.6 provides checks at two crucial points, 
it can be quickly laid out on any cross-section paper, and it is 
straightforward, convenient, and simple. After he has worked 
several correlation problems using the setup as shown in the table, 
the student is advised to drop rows e and f and to check the d, Dd; 
column by repeating the operations it calls for. All computations 
without cross-checks should, of course, be performed twice to insure 
accuracy. 

Little more needs to be said regarding the computation of rz, for 
grouped data. The work begins with the construction of a corre- 
lation table in which the X class intervals are indicated across the 
top and the Y class intervals at the left, the X scores increasing 
from left to right, and the Y scores from bottom to top. Ordinarily 
there should be between 10 and 20 classes for X and between 10 and 
20 classes for У. Each pair of scores is then entered as a tally in the 
appropriate cell. The tallies in the respective rows are summed to 
give f, and those in the columns to give f+. Next, the columns at the 
right and the rows at the bottom are completed as shown in Table 
6.6. Finally, the appropriate sums are entered in formula (6.7). The 
student can convince himself that Z(d, d.) in the correlation table 
is the sum of the products of paired coded scores by actually com- 
puting each product and summing over the entire table. 


Exercises 


26. Why are the totals of columns 3, 5, and 6 equal respectively to the 
totals of rows e, c, and f, in Table 6.69 

27. Would a change in either or both grouping schemes in the correlation 
table affect the correlation coefficient? Explain. 

28. Find the correlation coefficient between mental age and vocabulary 
in Schools D, E, F, and G, Table I, Appendix B, combined. 

29. Find the coefficient of correlation between Regents' Language and 
College Language, Table II, Appendix B. 


Other Methods of Correlation 


The Fourfold Point Correlation Coefficient as an Approxi- 
mation of r,,. The tendency of the tallies in a correlation table to 
bunch in quadrants I and III, as in Fig. 6.2, when there is positive 
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correlation between the variables (or in quadrants II and IV when 
there is negative correlation) suggests a quick method of approxi- 
mating rz. 

Consider again the mental age and vocabulary scores in Table 
6.6. If we dichotomize both sets of scores by classifying them accord- 
ing to whether they are above or below selected points, a 2 х 2-fold 
or fourfold table results, as shown in Table 6.7. 


TABLE 6.7 


FOURFOLD CLASSIFICATION OF MENTAL AGES AND 
VOCABULARY SCORES 
(Data from table 6.6) 


MENTAL AGE Х 


Below 147 and уа, 
147 Аһоуе ape d, 1,4, дез row 4,54, 

м 
4 Peine TENET 34 34 555 
р 
"peau 5 0 0 0 15 0 
? 

Se 19 34 34 40 25 

4. (fuda) (5,4?) х(а,х4.) 

һа. 40 (х/.4.) 

а 40 (х/,42) 


When we substitute the various sums from the table in formula 
(6.7) we obtain 


79 х 25 — 40 X 34 


lzy = 
Š 79 X 40 — (40)? V/79 X 34 — (34)? 
615 
тіз 20 


The product-moment coefficient of correlation of data classified 
in the fourfold table is commonly known as the fourfold point corre- 
lation coefficient. We shall designate it r,. (A similar coefficient is 
known as the phi coefficient and expressed as ф. There is no practical 
difference between our r, and $.) 
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There is a simple formula available for the computation of г„ a 
proof of which is given in Appendix A. In the fourfold diagram of 
Fig. 6.3 the letters A, B, C, and D represent the frequencies in the 
respective cells, the two distributions running from left to right 


== 
-—— 
A B 
“(%) | Ь-%) |a+b=p 
A B 
с+9=91 
Ç D 
а+с=р2 b+d=q2 
Fig. 6.3. The fourfold Fig. 6.4. The fourfold corre- 
correlation table for for- lation table for formula (6.87). 
mula (6.8). 4, B, C, and D a, b, c, and d are relative fre- 
are absolute frequencies. quencies. 


and from bottom to top, as indicated by the arrows. When the 
fourfold table is set up in this manner, the formula for r, is 
я PO — AD | (6.8) 
м (А + B)(C + D)(A + C)(B + D) 


Applying the formula to the problem above, we have 


25 Х 30 — 9 X 15 


e М/(9 + 25)(30 + 15)(9 + 30)(25 + 15) n 


Formula (6.8) is frequently expressed in terms of relative fre- 
quencies, so that 
bc — ad (6.8/) 


M ea — Al 
4 ҮІІІ 


the various quantities being defined as in the diagram of Fig. 6.4. It 
may easily be shown that (6.8) and (6.8’) are equivalent. 
The fourfold point coefficient r, is not ordinarily as good an ap- 


258 Statistics т Education 


proximation of the correlation between two continuous variables as 
the fetrachoric coefficient гі, also a fourfold point measure. This coeffi- 
cient is described below. The estimate provided by r, tends to be 
consistently too low. Some writers recommend that it be divided by 
.637 (a sort of correction for coarseness of grouping), but that 
practice is not recommended here. The use of r, in the case of con- 
tinuous variables is recommended only when it is desired to select 
the most promising variables among several before computing rz, 
in the usual manner. For example, it might be desired to approxi- 
mate quickly the correlation in various combinations of the admis- 
sions data and freshmen performance, Table П, Appendix В, before 
selecting several for more extended analysis. As another example, 
the correlation between various pairs of variables of Table I, 
Appendix B, might be examined, preliminary to further and more 
exact study. For such examination, a plan of classification should be 
selected so that A + В is approximately equal to C + D and A + C 
to B + D, i.e., the division points should be as near the respective 
medians of the two distributions as is convenient. 

The Tetrachoric Coefficient as an Approximation of rzy. 
Tetrachoric correlation is related to fourfold point correlation as 
normalized biserial correlation is related to point biserial correlation. 

When certain assumptions are made, the most important of 
which are that Х and У are continuous variables, linearly related, 
and distributed normally, the properties of the normal curve may 
be used to derive a formula for the coefficient of correlation of 
four-point data. This coefficient is called the tetrachoric coefficient 
and is commonly expressed аз гү. It is the product-moment coeffi- 
cient of correlation in the normal bivariate distribution which would 
fit the observed data. If the bivariate population from which the 
observed data are taken satisfies the assumptions stated above, 
r, is a good approximation of the population rsy. 

Both the derivation of the formula for г, and the formula itself 
are mathematically complex. (See Ref. 6, pp. 366-375.) There are, 
however, several simplified formulas and methods for computing 
т. among the most useful of which is the cosine-pi formula 


SUPE (EBEN SD ME e) 
т = cos ( Japs A (6.9) 


ppe n a A ИИ 
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in which A, B, C, and D, are the frequencies in cells as shown in 
the model fourfold correlation table of Fig. 6.3. 

Formula (6.9) is easy to use and is recommended in practical 
situations. The more exact methods of computing r, are both 
laborious and unjustified. Since the assumptions underlying г, are 
rarely if ever fully satisfied, exactness is likely to be spurious. To 
illustrate the use of the formula, we return to the data of Table 6.7. 
The values for substitution are 


м В, @= ОЕ, 


so that 


[ 
Tt cos ( v9 x 15 ia) = cos 53.69, 
M9 X 15 + 3/25 X 30 


From Table L, Appendix C, we find the cosine of 53.6? to be .59. 
Thus, by formula (6.9), г, = .59. The value of r, for these data is 
.40, so here, as is generally the case, r: is considerably greater than гу. 

The tetrachorie coefficient г, may be used in exactly the same 
way as the fourfold point coefficient rp. If the assumptions which 
underlie it are reasonably well met, r, gives a good approximation 
of ғ. Both coefficients are particularly useful in approximating rz, 
between variables whose values or coded values are punched in 
IBM cards. 

In addition to their uses in preliminary study, both r, and г; 
have an important application in correlation problems involving truly 
dichotomous data. 

Application of r, and г, to Dichotomous Data. When we need 
a measure of relationship between two variables which are truly 
dichotomous, i.e., available in only two categories, гр or г; is perhaps 
as good a measure as any available. Suppose we want to find out 
whether there is a relationship between broken homes and delin- 
quency and that we select a random sample of 100 young individuals 
in an area where delinquency is prevalent. Let us suppose that 35 of 
our youth come from broken homes and that of these, 16 have been 
declared delinquent by a juvenile court, and that 18 of the 65 from 
unbroken homes have been so declared. We may classify our data 
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in the fourfold table 


DELINQUENT NONDELINQUENT 


UNBROKEN 18 17 65 
НОМЕ 
PAIS | 16 | 19 35 
HOME I 
M 66 100 


and compute гр: 
non AUX 16 - 18 X 19 
P ^ /65 X 35 X 34 X 66 


'Thus, there is some association between broken homes and de- 
linquency in our sample. In the same manner, we could investigate 
other factors, such as sex, color, or health, which might be associated 
with delinquency. 

Although r;, as far as the operations are concerned, can be applied 
to problems like the above, the assumptions underlying it would 
seem questionable. 

Both r, and r, are widely used in correlating the responses of a 
group to pairs of items on a questionnaire or test. For example, if a 
number of pairs of cross-checking questions are included in a ques- 
tionnaire, the consistency of response to any pair of questions may 
be examined by fourfold point correlation. Suppose that in a pair 
of cross-checking questions i and j, the “yes” and “по” responses 
of 200 individuals are as follows: 


QUESTION i 


yes 80 
QUESTION j 

BD 120 

200 


If the responses were perfectly consistent, only the upper right 
and lower left cells would contain frequencies, and either Тр OT T, 
would be 1.00. As tabulated, the responses are correlated with 
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Tp = .20. So far as these questions, which supposedly ask the same 
thing, are concerned, the questionnaire has little dependability. 

We shall discuss a somewhat similar application of fourfold corre- 
lation in connection with test item analysis, p. 366. 

There are a great many problems in educational research for 
which r, or г, is an appropriate measure of relationship. Whenever 
two possibly related variables can be observed no more precisely 
than dichotomously, fourfold correlation may be useful. If it is 
reasonable to suppose that there is normality underlying the 
variables, л, is presumably the more appropriate. The supposition 
is usually more or less questionable, however, if the data are truly 
dichotomous. For such data, г; is the product-moment coefficient 
and involves no doubtful assumptions. 

The Contingency Coefficient. It is sometimes the case that one 
or both of two possibly correlated variables fall in three or more 
categories. When this is true, fourfold correlation methods cannot 
be used. When the data require classification in а 2 X 3-fold, 3 x 3- 
fold, 3 X 4-fold, or in general a h X k-fold table (h ог k or both 
greater than 2) the most useful measure of rélationship is the 
conlingency coefficient C. Before defining C let us examine а set of 
bivariate data classified in a contingency table. 

The semester averages, in three categories, and extent of par- 
ticipation in student activities of 146 college freshmen are shown 
in contingency Table 6.8. For the moment, ignore the numbers in 
parentheses in the table, noting only that 10 freshmen with averages 
below 70.0 participated much, 16 an average amount, 14 little, and 
80 on. 

If there is relationship between semester averages and participa- 
tion in student activities, the frequencies in certain cells will tend 
to be relatively great. If the relationship is positive, relatively 
greater frequencies will appear in the lower left, middle, and upper 
right cells; if negative, in the upper left, middle, and lower right. 
On the other hand, if there is little or no relationship, the fre- 
quencies will tend to show only proportional density in the respec- 
tive cells, i.e., cell frequencies will be distributed in the same ratio 
as the marginal totals. 

These considerations lead to the idea underlying contingency 
correlation, namely, the amount of correlation between variables in 
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TABLE 6.8 


SEMESTER AVERAGE AND EXTENT OF PARTICIPATION IN 
STUDENT ACTIVITIES OF 146 COLLEGE FRESHMEN 
(Data from table II, appendix В) 


70.0 


5) 


SEMESTER AVERAGE 
70.0-80.0 


Above 80.0 


TOTAL 


42 


Below 
Much 10 
PARTICIPATION 
IN STUDENT 
ACTIVITIES Ayerage 
Little 
TOTAL 10 


40 


146 


the contingency lable depends upon the divergence of observed fre- 
quencies in cells from the frequencies which would be expected if there 
were no relalionship. The first task in computing the contingency 
coefficient C is that of determining the frequencies which would be 
expected if there were in fact no relationship between the data as 
classified in the contingency table. This amounts to determining 
frequencies in the cells which are proportional to the marginal totals. 


For the data in hand these are: 


CELL 
Upper left 
Upper middle 
Upper right 
Middle left 
Middle middle 
Middle right 
Lower left 
Lower middle 
Lower right 


EXPECTED FREQUENCY 


(42 х 40)/146 = 11. 
(42 X 66)/146 = 19. 
(42 X 40)/146 = 11. 
(52 X 40)/146 = 14. 
(52 X 66)/146 = 23. 
(52 X 40)/146 = 14. 
(52 X 40)/146 = 14. 
(52 X 66)/146 = 23. 
(52 X 40)/146 = 14. 


һә єл һә һә л юл ол 


These frequencies, shown in parentheses in Table 6.8, are the fre- 
quencies we would expect in the various cells if there were no rela- 
tionship in our group between semester averages and extent of 
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participation in student activities. The student can easily verify that 
the expected frequencies in any row are distributed in the ratio 
40:66:40, and those in any column distributed in the ratio 42:52:52. 
We are now ready to define the contingency coefficient С. The 
coefficient is usually stated in terms of x? (chi square), x? being 

defined by 
ыа (6.10) 

Л 

in which fy is the frequency observed in a cell and /, the expected 
frequency in that cell. When x? is defined as in (6.10), C is given by 


Еа. 6.11 
vw amon (610 


where № is the total frequency in the contingency table. 
The computation of C for the data of Table 6.8 is shown below. 


CELL (fo — fe) Go =f)? (fo — Л 
Upper left — 1.5 2 ‚20 
Upper middle +10.0 5.26 
Upper right — 8.5 6.28 
Middle left + 1.8 .28 
Middle middle = 15. .01 
Middle right o 10 
Lower left = .2 ‚00 
Lower middle - 9.5 3.84 
Lower right + 9.8 76 


The fact that C = +.37 tells us that there is correlation in the 
group between semester averages and extent’ of participation in 
student activities. Before we can determine whether the correlation 
is negative or positive we must examine the classification in the 
table and note the cells in which the discrepancies between observed 
and expected frequencies are the most pronounced. In this case 
the relationship is negative, in a meaningful sense. 

As a rule, по sign should be attached to C, since the coefficient 
indicates only whether our data as classified are related. Inversion 
of rows or columns іп a given contingency table does not affect the 
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value or sign of С. Any further interpretation must be made in 
light of the nature and classification of the variables. This is a dis- 
advantage of C, although not ordinarily a serious one. When the 
direction of relationship has real meaning, it usually can be inferred 
through inspection of the contingency table. 

The contingency coefficient is not a product-moment coefficient 
of correlation, but when N is large and when the categorized vari- 
ables are continuous, normally distributed, and linearly related, 
C approaches r,, as the number of categories for each variable 
increases. The fewer the cells in the contingency table, the more 
C underestimates rzy. Tt can be shown that C computed from the 
2 X 2-fold table cannot exceed .71; from the 3 X 3-fold table, .82; 
from the h X h-fold table, +/(h — 1)/h. This is a serious disadvan- 
tage, since C's computed from different classifications of even the 
same data are not comparable. Furthermore, C's are not comparable 
to rz,'s. It is sometimes recommended that an obtained C be divided 
by its upper limit in order to make it more comparable to r,,. In 
the problem above, we obtain a C of .37. If we divide .37 by .82, 
the upper limit of C in the 3 x 3-fold table, we obtain .45. This 
adjusted С of .45 presumably is a better approximation of the 
product-moment coefficient than .37. There is, however, no sound 
reason for making the adjustment, and the student is advised to 
report unadjusted C's. If more precise measures of relationship are 
needed, the variables should be measured and analyzed more pre- 
cisely. If they cannot be, precise approximations have little real 
meaning or usefulness. 

Despite its limitations, С has several important advantages. 
Being based upon x?, it can easily be tested for significance. (See 
pp. 479-482.) It is perhaps the best measure of relationship avail- 
able if we have only categorical observations on two variables, one 
or both of which fall into more than two classes. The coefficient 
involves no assumptions regarding linearity, normality, or com- 
parability of units. It can be used when the variables are continuous, 
discrete, or qualitative, or when one variable is of one kind and 
the other of another kind. 

Correlation of Attributes. The contingency coefficient is par- 
ticularly well adapted to qualitative variables or attributes. It can 
be used in determining the strength of association between occu- 
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pations of fathers and sons, between occupation and church mem- 
bership, between color of eye and color of hair, between type and 
site of cancer, and so on. In analyzing relationships between attri- 
butes, correlation does not have its usual meaning. Correlation 
between attributes can rarely be said to be positive or negative, in a 
meaningful sense. Such correlation cannot be interpreted as, or 
compared with, product-moment correlation, since attributes by 
definition are nonquantitative. Hence, as a measure of strength of 
relationship or association between attributes, the contingency 
coefficient does not suffer from the disadvantages it has as a measure 
of relationship between quantitative variables. 


Exercises 


30. The history and geography scores of 293 eighth-grade pupils are en- 
tered in the fourfold table below. Calculate r, and ги. Which is the 
better coefficient for these data? 


HISTORY SCORE 
Below 24 24 and above 


25 and 
GEOGRAPHY Above 


SCORE Below 
25 


31 


Classify the Regents’ average and the semester average scores of 
Table IT, Appendix B, in the fourfold table below. Find г„ and ге. Which 
coefficient is preferred in this case? 


REGENT'S AVERAGE SCORE 
Below 88.0 88.0 and Above 


13.0 and 
SEMESTER Аһоуе 


AVERAGE Below 
SCORE 73.0 


32. In a study of teaching efficiency, it was found that of 52 successful 
teachers, 38 had held one or more elective offices as students in high 
school or college. In a group of 37 unsuccessful teachers only 11 had 
held such offices. Classify the data in a fourfold table. Find л and гу 
and interpret. Which coefficient is preferred in this case? Suggest other 
possible fourfold correlational studies of successful and unsuccessful 
teachers. 
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33. In a group of 41 males and 46 females, the responses to a questionnaire 
item were: “yes,” 35 males and 15 females; “по,” 6 males and 31 fe- 
males. Classify the data in a fourfold table, compute rp, and interpret. 
Show that either a positive or a negative value of r, may be obtained, 
depending upon how the table is arranged. When genuine attributes, 
such as sex, are involved, does correlation Вауе its usual meaning? 

34. Compute and interpret the contingency coefficient of the data in 
Table 2.1, p. 37. (Note: Absolute frequencies, not proportions or 
percentages, must be used in computing C.) 

35. Under what conditions can C be regarded as positive or negative in a 
meaningful sense? 

36. The responses of 40 men and 45 women to four questionnaire items 
are shown below. Find the contingency coefficient for each item and 


interpret. 
STRONGLY STRONG 
AGREE AGREE UNCERTAIN DISAGREE DISAGH 
Item a: Men 0 15 12 5 8 
Women 10 18 7 8 2 
Item 6: Men 5 10 11 8 9 
Women 6 п 15 6 T 
Item с: Men 3 3 9 10 15 
Women 3 5 11 10 16 
Item d: Men 15 17 4 3 1 
Women 1 6 10 12 16 


87. A sample of 1,831 Cleveland elementary school children were examined 
in speed and quality of reading. One hundred and eighty-three of the 
1,831 demonstrated good quality and fast reading; 201, good quality 
and medium speed; 73, good quality and slow speed; 220, medium 
quality and fast reading; 476, medium quality and medium speed; 
220, medium quality and slow speed; 73, poor quality and fast reading; 
220, poor quality and medium speed; and 165, poor quality and slow 
speed. Classify the data in a contingency table, find C, and interpret. 

38. Discuss the uses, limitations, and advantages of С as a measure of 
relationship. 

39. By algebraic proof or by application to the data in a given 2 X 2-fold 
table, show that r,? = С?/(1 — СЗ). 

40. Describe one or more specific situations, not mentioned in the text, 
in which the contingency coefficient would be appropriate and useful. 
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Linear Regression 


The fact that correlated data tend to establish a pattern when 
plotted in a scatter diagram suggests another way of measuring 
and describing the association between two variables, namely, that 
of mathematically describing the pattern. 

The Linear Pattern. The simplest pattern which correlated data 
may tend to generate is the straight line; in fact, as has already 
been seen, a product-moment coefficient of correlation of 1.00 char- 
acterizes data which lie on a straight line in the scatter diagram, 
(We are here concerned only with data which are linearly related, 
if related at all; ie., data which are fitted as well or better by a 
straight line than some other curve.) 

Consider the mental ages and Arithmetic Problems Test scores 
of the 23 pupils of School A, Table I, Appendix B. The data are 
entered in Table 6.9 and are plotted in the scatter diagram of Fig. 
6.5. (In order to simplify computations, the constant 130 has been 
subtracted from each of the mental ages in the table.) The data 
tend to fall in a linear pattern, but with considerable scattering, 
indicating that, although performance on the Arithmetic Prob- 
lems Test is related to mental age, variables other than mental 
age also influence the performance. The relationship between the 
two given variables may be summarized by the usual product- 
moment coefficient of correlation. Substituting the sums at the foot 
of Table 6.9 in formula (6.3) we have 


Ше: 23 x 8,129 — 513 X 358 259 
= = /23 x 14,439 — (513)? V/23 X 6,104 — (358)? у 


There is а more complete and frequently more useful method of 
describing the relationship between two variables than that pro- 
vided by correlation, although, as we shall see, the method is inti- 
mately related to correlation. The method consists essentially in 
determining both the equation of the line generated by bivariate 
data, and a measure of the extent to which the data scatter about 
the line. 

The Equation of a Straight Line. It will be recalled from ele- 
mentary algebra that the general equation of a straight line is 


у= 2 +a, 
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TABLE 6.9 


MENTAL AGES AND ARITHMETIC PROBLEMS TEST 
SCORES OF 23 EIGHTH-GRADE PUPILS 


м— aaaaaaaaaaaaaaaaaasssssssħiI 


MENTAL AGE ARITHMETIC 

— 130 PROBLEMS 

pions 130) dy а x E. 

1 11 121 44 

50 26 676 1,300 

27 10 100 270 

34 22 484 748 

15 9 81 135 

9 6 36 54 

20 15 225 300 

26 14 196 364 

17 21 441 357 

п 16 256 176 

31 14 196 434 

28 16 256 448 

15 14 196 210 

5 10 2 100 50 

31 2 961 529 713 

34 21 1,156 441 714 

15 19 225 361 285 

15 1 225 121 165 

13 18 169 324 234 

15 16 225 256 240 

31 16 961 256 496 

27 16 729 256 432 

40 14 1,600 196 560 

sum 513 358 14,439 6,104 8,729 


in which b is the slope of the line (the tangent of the angle the line 
makes with the X-axis) and a is the y-intercept (the value of y when 
тіз zero). When b and a are given, the equation describes one and 
only one straight line. For example, if b is 2 and a is 5, we have 


у = 21 +5. 


By substituting convenient values, say 2, 3, and 4 for z, we have 
z= 2y =9ўх=3,у = 1; т = 4, у = 13. If plotted, the роїпїз, 
(2,9), (3,11), (4,13), lie on a straight line, and all values which 
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satisfy the statement of relationship between 2 and у, expressed in 
the equation у = 22 + 5, will lie on the line. 

The Regression Line. Observational data, like those of Fig. 6.5, 
never behave as nicely as mathematical data, and we can, as usual 
in statistics, deal only with the “general tendency.” 


27 


Arithmetic problems У 


| TS 
135 140 145 150 155 160 165 170 175, 180 185 
Mental age X 


Fig. 6.5. Seatter diagram and regression lines. (From Table 6.9.) 


What line can we construct which will fairly summarize the 
linear tendency of the dots in Fig. 6,5? We might use a ruler, and, 
by inspection, draw a line as close to the various dots as possible. 
Such a graphical method would have some merit, but it would not 
give uniform results. Different individuals would construct different 
lines. 

"There аге several mathematical methods by which the equation 
of the line which “best fits” correlated data like those of Fig. 6.5 
might be found; The standard method used in statistics is that of 
“least squares." The “least squares" method is one which deter- 
mines the equation of the line such that the sum of the squares 
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of the vertical distances of the dots from the line is a minimum. 


(One of the 23 distances d is shown in Fi ig. 6.5.) The general equation 
of the “least squares” line is, in deviation score form, 


y= PEU (6.12) 


and, in raw score form, 


ЖЕ pres ore SEE ES 
Y | eRe Та Х)+ Y. (613) 


The symbols у^ and У” are used instead of y and Y because the 
equation gives the values which the y-variable would have for 
various values of the z-variable if the data actually fell in a straight 
line, i.e., the primes indicate theoretical or idealized values of Y. 
The quantity Zzy/Xz? or [NZXY — (ZX)(zY)/[NzX? — (2X) 
is called the coefficient of regression of Y on X and is customarily 
written Б. Thus, in terms of deviation scores, b,, = Уту/ Уа? in 
terms of raw scores, bj, = [NzXY — (ZX)(ZY)lJ[NzX? — (ZX)!]. 
The line whose equation is given by (6.12) or (6.13) is called the 
line of regression of Y оп X. 

Returning to our example, we may now determine the equation 
of the line of regression of Arithmetic Problems Test scores on 
mental ages. The mean mental age is 513/23 + 130 or 152.30 and 
the mean problem score is 358/23 or 15.51. Substituting these 
means and the sums at the foot of Table 6.9 in formula (6.13), 
we Вауе 


23 Х 8,129 — 513 Х 358 
Y = | BK 489 819): Ік - 152.30) + 15.57, 


and, after simplification, 


Y’ = .25Х — 92.5. 


This is the equation of the regression line of Y on X for the data in 
hand. The line is shown in Fig. 6.5. The slope of the line or the 
coefficient of regression bye 18 25: 

Had the “least squares” line been determined so as to make the 
sum of the squares of the horizontal distances of the dots from the 
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line a minimum, its equation in raw score form would have been 


х Еи — (®Х)(>Ү) 
й NEY? — (27) 
The quantity [NEXY — (ZX)(ZY)|/[NZY? — (ZY)?] is called the 
coefficient of regression of X on Y and is customarily written bsy. 
The line whose equation is given by (6.14) is called the line of re- 
gression of X on Y. In the present example, the equation of this , 
line, when the appropriate means and sums from Table 6.9 are 
substituted in formula (6.14), is 
x' [2 X 8,129 — 513 x 358 
i 23 Х 6,104 — (358)? 


which simplifies to 


[о Y)+X. (6.14) 


| (Ү — 15.57) + 152.30, 


X^ = 1.40Y + 130.5. 
The line is drawn and labeled in Fig. 6.5. The slope of the line or 
bay is 1.40. 

The student may well ask how it is that one set of related data 
permits two “best-fit” regression lines. The answer is to be found 
in the method of finding the equations of the lines. When the sum 
of squares of the vertical distances of the dots from the line is 
minimized, the line of (6.12) or (6.13) results; when the sum of 
squares of the horizontal distances of the dots from the line is 
minimized, equation (6.14) results. Neither line can be said to 
provide a better description than the other, and, as we shall later 
see, both have their places in the analysis and interpretation of 
bivariate data. Ordinarily, in a given problem the regression line 
of the variable which is considered to be dependent is the one 
determined, e.g., if the relationship between scholastic aptitude Х 
and scholastic achievement Y were being described, the regression 
line of Y on Х ordinarily would be reported. In the above case, the 
Arithmetic Problems Test scores would ordinarily be considered 
dependent upon mental age, so that the regression line of Y on X 
would be the one of primary concern. The more highly the data 
are correlated, the more the lines tend to coincide; in fact, for per- 
fectly correlated data, the lines do coincide. 

Relation of Regression Coefficients to г... Ап interesting and 
useful relationship between the regression coefficients bys and bay 
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and the product-moment correlation coefficient rz, exists. Since 
bj, = Zxy/Za*, by = Уту/®у?, and rz = Zzy/Nezc, a bit of alge- 
braic manipulation will result in the equalities 


а, 
y 
b LP 
Oz 
Oz 
bzy = Pay = 
бу 


Thus, the regression equations may be written 
Се Ү (6.15) 
с: 


Е их (6.16) 
бу 


For the illustrative data above, X = 152.30, о, = 11.41, У = 15.57, 
су = 4.81, and г„ = .59. Making the appropriate substitutions 
and simplifying in equations (6.15) and (6.16) we obtain, as before, 


Y’ = .25Х — 22.5 
А’ = 1.40Y + 130.5. 


‘It is interesting to note that the geometric mean of the regression 
coefficients is the correlation coefficient. The proof of this is left 
as an exercise for the student. 

The above relationships between the regression and the correla- 
tion coefficients hold only for product-moment correlation coeffi- 
cients. This is one of the reasons why гл, is favored over other 
measures of correlation. 

Regression Lines for Grouped Data. The extension of regres- 
sion methods to grouped data is both simple and instructive, Let 
us examine a typical problem. 

The Regents’ average and the semester average scores of the 76 
college freshmen of Table IT, Appendix B, are entered in correlation 
Table 6.10. Under the assumption that the scores fall at the mid- 
points of their respective classes, we тау compute the means of 
the rows and the means of the columns of the grouped scores. The 
mean of the three scores in the 90.0-94.9 row is (94.95 + 2 2% 
96.95)/3 or 96.3. Тһе mean of the six scores in the 94.0-95.9 column 
is (92.45 + 4 x 87.45 + 82.45)/6 or 87.4. The student should 
verify the correctness of the other means of rows and columns. 


%26 7728 1:62. 878: 9752 8722 9:52 $ < ©`89 £'69 VoL миолоо 


ао музи 
5 9 © TI £l SI 


6772-0702. 


6'6L-0'SL 


6'18-0'08 


6'68-0'S8 
m 


HOVUSAV 
MISSAS 


"IV AWHENT 


ао 
56706 | $6`38 XNIOdQIN 


6:18 М HOYMHAY 
-0:98 SINTON 


(4 xipuodd» ‘77 2190 шол] зод) 
VIVd азапомэ AO SANIT NOISSTYDAA ANV SAKANA 


У ЛО SNVAW 
0r9 ATAVL 


274 Statistics т Education 


When the means of the columns and the rows in the correlation 
table are plotted, as shown by the ж5 and @’s, respectively, in 
Table 6.10, they tend to fall in straight lines running from the 
lower left to the upper right portions of the table, if the data are 
linearly related. It was such "progression of means" which led 
Sir Francis Galton to formulate the law of regression in his investi- 
gation of heights of parents and offspring. (Galton actually worked 
with column and row medians rather than means.) ; 

It can be shown that the line which best fits the means of the 
columns in the least squares sense also best fits the scores and vice 
versa. Hence, the equations of the regression lines fitted to means 
will be the same as those fitted to ungrouped data. The regression 
lines for grouped as well as ungrouped data are given by equations 
(6.13) and (6.14) or (6.15) and (6.16). 

For the grouped data of Table 6.10, 

X= 8716 5-495, Y=75,02, oy = 1.63, ry = .65, 
so that for these data the regression equations are 
Y m 65 103 (X — 87.16) + 15.02 


КҮЛІ Сене 
№ = .65 g (Y — 75.02) + 87.16 


1.00Х — 12.14, 


\ 


\ 


AZY + 55.65. 


The equations are plotted in Table 6.10, We shall return to this line 
of regression of Y on X in connection with prediction. 

The Regression Tendency. Since the time of Galton, it has been 
recognized that tall parents tend to have offspring less tall, and 
short parents offspring less short, than themselves, In Galton’s 
words (Ref. 2, p. 95); 

However paradoxical it may appear at first sight, it is theoretically a 
necessary fact, and one that is clearly confirmed by observation, that 
the Stature of the adult offspring must, on the whole, be more mediocre 
than the stature of their Parents; that is to say, more near to the М 
[median] of the general Population, 


Galton referred to this tendency as the law of regression, and general- 
ized to various hereditary traits in these words (Ref. 2, p. 106): 


The law of Regression tells heavily against the full hereditary trans- 
mission of any gift. Only a few out of many children would be likely to 
differ from mediocrity so widely as their Mid-Parent [average of parents] 


Correlation and Regression 275 


and still fewer would differ as widely as the more exceptional of the two 
Parents. The more bountifully the Parent is gifted by nature, the more 
rare will be his good fortune if he begets a son who is as richly endowed 
as himself, and still more so if he has a son who is endowed yet more 
largely. But the law is even-handed; it levies an equal succession-tax 
on the transmission of badness as of goodness. If it discourages the ex- 
travagant hopes of a gifted parent that his children will inherit all his 
powers; it no less discountenances extravagant fears that they will 
inherit all his weakness and disease. 

It must be clearly understood that there is nothing in these statements 
to invalidate the general doctrine that the children of a gifted pair are 
much more likely to be gifted than the children of a mediocre pair. They 
merely express the fact that the ablest of all the children of a few gifted 
pairs is not likely to be as gifted as the ablest of all the children of a 
уегу great many mediocre pairs. 


The regression tendency is observable in all situations in which 
bivariate data are imperfectly correlated. A group of students of 
superior academic aptitude are, on the average, less superior in 
academic achievement; a group of students of inferior aptitude are, 
on the average, less inferior in achievement. А group of highly 
intelligent men will be found to be married to less intelligent 
women, on the average, and vice versa. Tall men are, as a group, 
less extreme in weight than in height. Heavy men or light men 
are, as a group, less extreme in height than in weight. These facts 
are true because, like traits of parents and offspring, academic 
aptitude and achievement, intelligence of married pairs, and height 
and weight are imperfectly correlated. In general, imperfectly corre- 
lated measures show regression loward the mean. This is ап important 
“law” for teachers, school counselors, and research students, and 
we shall illustrate its meaning in some detail. 

In the scatter diagram of Fig. 6.5, five pupils have mental ages 
of 145. Since the mean of the mental ages is 152.3 and the standard 
deviation 11.41, the standard score equivalent of 145 is —.64. The 
Arithmetic Problems Test scores of the 5 are 9, 11, 14, 16, and 19, 
with a mean of 13.8. The mean of all of the test scores is 15.6, and 
the standard deviation is 4.9. Hence, 13.8 has a standard score 
equivalent of —.37. Thus, the 5 pupils, as a group, are nearer the 
mean of the problem solving test than the mean of the mental ages, 
when the distances are measured in comparable units. 

In Table 6.10 the 11 freshmen having Regents’ scores in the 


276 Statistics in Education 


column whose mid-value is about 91, have a mean semester average 
of about 79. The standard score equivalent of the former is about 
-T7 and that of the latter about .52. In other words, the 11 freshmen 
are, as a group, nearer the mean of the semester averages than the 
mean of the Regents’ scores. We may note the same regression 
tendency toward the mean of the Regents’ scores. For example, the 
seven freshmen having semester averages in the row whose mid- 
value is 87.4 have a mean Regents’ score of 93.2. The standard 
score equivalent of the former is 1.62 and that of the latter 1.22. 

Tt can be shown that the regression tendency always exists when 
data are imperfectly correlated. If we select a set of equal or nearly 
equal values of one variable, the paired values of the related variable 
will, as a group, tend to be less extreme than the former. This tend- 
ency does not mean that all of the paired values will be less ex- 
treme; in fact, it may be that one or more will be more extreme. 
The majority of them, however, will be less extreme, i.e., will tend 
to show regression toward the mean, if comparable units are used. 

"The regression tendency tells us that we can expect, as a rule, 
to find exceptional individuals in 
one trait less exceptional in re- 
lated traits. The reason why, say, 
students of a given degree of ac- 
ademic aptitude exhibit, on the 
average, a more moderate degree 
of academie achievement is not 
clear, but so long as aptitude and 
achievement are imperfectly cor- 
Y related, the tendency is inevi- 
table. The higher the correlation, 
of course, the less marked the 

X X tendency. 

Fig. 6.6. Data characterized by Mis Ame td сЕ м 
similar lines of regression of Y a ^ о 
on X, but dissimilar scatter about line- The relationship between 
the lines. two variables is not adequately 

described by a regression line, 
Sets of data, like those shown in Е ig. 6.6, may yield quite similar re- 
gression lines, yet be characterized by quite dissimilar degrees of 
relationship. 
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It has been seen that linearly related data tend both to generate a 
line and to scatter about the line; hence, both line and scatter need 
to be considered in summarizing the relationship. The need is 
analogous to the need for measures of both central tendency and 
variability in describing the single frequency distribution; in fact, 
the regression line is a sort of “mean line” as will be brought out in 
connection with prediction. 

Both the linear trend and the scatter of correlated data are 
clearly seen in Fig. 6.5. For those data, the relationship between 
X and Y, Y being considered the dependent variable, can be de- 
scribed by the regression line Y^ = .25X — 22.5. When we inspect 
Fig. 6.5, however, we note that the observed Y's are scattered 
quite markedly about the line. If the relationship were perfect, a 
pupil having a mental age of 145 months would have a score of 
about 14 on the Arithmetic Problems Test. The 5 pupils having 
mental ages of 145 months actually have test scores of 9, 11, 14, 
16, and 19, respectively. 

How shall we measure the extent of scatter of all of the Y’s from 
the regression line? Numerous methods might be used, but the 
method used in statistics is that of determining the s/andard devi- 
ation of the differences between the theoretical Y's and the observed Y’s. 
It will be seen that these differences are merely the vertical distances 
of the observed Y's from the line of regression of Y оп X. 

The theoretical Y's in the present example, as determined by 
substituting the successive values of X in the equation Y’ = 
.25.X — 22.5, are shown in column 3 of Table 6.11, and the differ- 
ences between the theoretical and observed values are shown in 
column 4. These differences are in deviation form (they would sum 
to zero but for the effect of rounding), and we may find their stand- 
ard deviation by squaring them, dividing by 23, and extracting the 
square root. When this is done, we obtain 3.89. This standard 
deviation of the differences between observed Y's and their theo- 
retical value, if the correlation were perfect, is customarily written 
сену ОГ ту, and is read the standard error of estimate or the standard 
error of Y independenl of X. 

There is a simple formula by which со, can be determined, the 
derivation of which is included in Appendix A: 


Syr cy Vl — P. (6.17) 
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TABLE 6.11 
OBSERVED AND THEORETICAL ARITHMETIC 
PROBLEMS TEST SCORES CORRESPONDING TO 
GIVEN MENTAL AGES AND DIFFERENCES 
BETWEEN THEM 
(Data from table 6.9) 


У = 
X Y .25X — 22.5 Y-Y ох 
134 11 11.0 0.0 ‚00 
180 26 22.5 3.5 12.25 
157 10 16.8 —6.8 46.24 
164 22 18.5 3.5 2.25 
145 9 13.8 -4.8 23.04 
139 6 12.2 —6.2 38.44 
150 15 15.0 0.0 0.00 
156 14 16.5 —2.5 6.25 
147 21 14.2 6.8 46.24 
141 16 12.8 3.2 10.24 
161 14 17.8 —3.8 14.44 
158 16 17.0 EL 1.00 
145 14 13.8 0.2 .04 
135 10 11.2 22122 1.44 
161 23 17.8 5.2 27.04 
164 21 18.5 2.5 6.25 
145 19 13.8 5.2 27.04 
145 11 13.8 =2.8 7.84 
143 18 13.2 4.8 23.04 
145 16 13.8 2.2 4.84 
161 16 17.8 =18 3.2 
157 16 16.8 —0.3 . 64 
170 14 20.0 —6.0 36.00 
SUM —0.6 347.80 


м— ees 


When we apply the formula to our illustrative data, remembering 
that c, is 4.81 and ra .59, we obtain 


буа = 4.81 МТ — (.59)? = 3.88, 


which is in close agreement with the value we obtained by actually 
finding the standard deviation of the differences Y — Y", 
By similar procedure we might find the standard deviation ОҒ: 
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the differences between the observed X’s and the idealized or theo- 
retical values given by the equation А” = 1.40Y + 130.5. The 
student will find that the standard deviation of the differences 
X — X'is 9.20, which is in close agreement with the result obtained 
when 11.41 and .59 are substituted for c; and Te, respectively, іп 
the formula 


Ory = 08 V1 — г, (6.18) 


Formulas (6.17) and (6.18) are illuminating. They state explicitly 
that the scatter about the regression line is a function of the standard. 
deviation of the dependent variable and the correlation coefficient. 
The magnitude of c, or oz. indicates the extent to which the 
variable is affected by variables other than the correlated one. We 
return to these measures of scatter about the regression line in 
connection with prediction and the interpretation of the coefficient 
of correlation. 

Derivation and Use of the Regression Equation. Since the 
derivation of the regression equation has been illustrated at several 
points in the preceding paragraphs, we need only to summarize the 
procedure here. 

Although the correlation coefficient does not have to be computed 
in deriving the regression equations for a particular set of bivariate 
data, it is ordinarily advisable to compute r+ and then to use 
formulas (6.15) and (6.16). (The rə is then available for use in 
determining the standard errors of estimate.) If the regression 
coefficients are needed, they can easily be obtained through the 
relationship, previously noted: 


т 
bye = Tay = 
bay = ru ~ 


Formulas (6.15) and (6.16) call for only means, standard deviations, 
and the coefficient of correlation. These are, of course, computed in 
the usual manner. The formulas are applicable to either grouped or 
ungrouped data. 

Both the У оп. Х and the X on У regression coefficients and lines 
are of importance in statistical theory, but in practical work only 
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one ordinarily is of concern. When one of the variables is designated 
as dependent on the other, only one of the regression lines has prac- 
tical meaning. If, for example, the relation of diet to weight were 
being investigated, weight logically would be considered the depend- 
ent variable, and the regression of weight on diet would be deter- 
mined. Similarly, as has been noted, in studying the relation 
between scholastic aptitude and school achievement, school achieve- 
ment would ordinarily be considered the dependent variable, and 
Ше regression of achievement on aptitude would be of concern. It 
is conventional in statistics, when dealing with bivariate data, to 
designate the dependent variable Y and the independent variable X. 
In this notation, the regression line of practical concern is the line 
of regression of Y on X. 

Regression theory has three important applications in applied 
research: (1) the estimation or prediction of one variable from 
knowledge of a related variable, (2) the analysis and interpretation 
of the relationship between two variables, and (3) the control of 
the effect of one or more variables upon the relationship between 
two others. We shall discuss these uses at some length in the sections 
following. 

Exercises 


41. Given the equation y = —2z + 3, complete the table of values and 
construct the line on cross-section paper. 


ar 


42. If data are negatively correlated, show that the signs of the regression 
coefficients are negative. What is the direction of the regression lines 
in the table of negatively correlated data? 

43. Substitute various values for X in the equation Y' — .25X — 22.5, 
and show that the resulting values of Y" lie on the line of regression of 
Y on X in Fig. 6.5. Similarly check the graph of А” = 1.40Y + 130.5. 

44. Investigate the relationships between the two regression lines of per- 
fectly correlated data and of noncorrelated data. Consider either a 
numerical example or the general case. 

45. Write the regression equations for the data of Table 6.1, p. 233. Find 
Фу. and c, and interpret. 
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46. Show that the regression equation of У on X, in standard score form, 
is A = IL 

47. The table below contains the semester averages and SAT scores for 
the 72 of the 76 freshmen of Table 6.10 for whom Regents’ averages are 
available. Derive the equation of the line of regression of Y on X, and 
determine the standard error of estimate oy... Interpret the two. 


с DAlLAILAlLlalalalalala 
dj acu Б RT е A UU S s e s ES ES e 
sE- i 3 Ж ашкы ШЕН RS S MN 
wsuw BIER 
AVERAGE Y Я а И о о ео |||] 
сё! С О ое)! |) |! aye 
90.0-94.9 1 1 1 
PES ік | кеннен Пер pe 
85.0-89,9 ЕЩ ЙД КЕ КЕ] ЕЕ ІСТЕ: 
80.0-84.9 1 ГӘ Ез Д 1 1 
75.0-79.9 1 БЕЙ ПСИ eee ad І 
70.074.9 1 1 т| 5 П (ІРІ БЕ 
65.0-69.9 1 HREAN 
60.0-64.9 І 1 
55.0—59.9 
50.0—54.9 1 


48. Referring to the table in exr. 47, assume that the mean of the SAT 
scores of the 10 freshmen in the 1,150-1,199 column is 174.5. What is 
the mean of the semester averages of these 10, using the mid-values 
of the classes as the values of the respective averages? Express the 
distances of these two means from their respective general means in. 
standard units and compare. What does the comparison illustrate? 

49. A school counselor noted that students of low IQ's as a group made 
better school marks, in proportion to their ability, than students of 
high IQ's. The counselor concluded that the students of low IQ's were 
trying harder. Is the conclusion supported by the evidence? 

50. What is meant by the regression lendency? What is its significance 
for teachers and guidance counselors? 


The Regression Line in Prediction 


It is frequently useful to estimate or predict the theoretical score 
of an individual in one variable, given his score in a related variable. 
For example, if it is known that scores on an aptitude test will later 
be correlated with success in a certain occupation, the aptitude 
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test scores may be used in predicting the success of given indi- 
viduals in the occupation. 

The idea of prediction, in this sense, naturally follows regression. 
In the prediction situation, the dependent variable is customarily 
referred to as the criterion variable and the independent variable 
as the predictor variable. In the above example, aptitude test per- 
formance is the predictor, success in occupation the criterion, The 
nature of prediction can be brought out most easily and clearly by 
reference to a practical problem. 

Statistical Prediction. Let us think about the work of the Ad- 
missions Committee at the college from which the data of Table IT, 
Appendix В, were obtained. The first job of the Committee is, of 
course, that of selecting high school graduates who show promise 
of satisfactory achievement in college and screening out those who 
do not. Each applicant for the freshman class submits various 
evidence in support of his application, such as rank in class, recom- 
mendations from his secondary school official, College Entrance 
Examination Board scores, and the like, and, if he has attended a 
New York State high school, Regents' examination scores. 

Any of these variables which is correlated with success in college 
is useful to the Committee in estimating the chances that a given 
applicant, if admitted, will in fact succeed. For example, if appli- 
cants of future year(s) are like the group from which the freshmen 
of Table 6.10 were selected, i.e., are members of the same popu- 
lation of potential freshmen, the relationship between their Regents’ 
averages and semester averages will be comparable to the relation- 
ship existing in the data of Table 6.10. The equation of the line of 
regression of semester averages on the Regents’ averages is, as 
derived in the preceding section, Y’ = 1.00X — 12.14. This regres- 
sion equation gives the best estimate of the semester average that 
will be made by an admitted applicant having a particular Regents’ 
average. By use of the equation, we would predict a semester aver- 
age Y’ of about 78 for an applicant having a Regents’ average of 
90.0, since we would have Y’ = 1.00 x 90.0 — 12.14. This is the 
best estimate of the semester average which will in fact be paired 
with a Regents’ average of 90.0, In other words, the mean semester 
average which will be earned by freshmen who had a Regents’ 
average of 90.0 will presumably be at or near 78, provided thal whal 
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has been true in the past holds true in the future. By similar use of the 
regression equation, the semester average theoretically paired with 
any given Regents’ average can be predicted. 

It will be seen that predicted scores are merely the theoretical 
scores Y" of the preceding section. In any situation in which we have 
been able to determine the relationship between a predictor and a 
criterion variable, it is possible to use the regression equation in 
prediction. The important condition underlying the procedure is 
that the individuals for whom predictions are being made are mem- 
bers of the same population as the sample in which the relationship 
was originally determined. It is furthermore necessary that the 
factors affecting performance on predictor and criterion variables 
remain constant, or nearly so. If the conditions are not met, the 
observed relationship cannot be expected to obtain in the future. 

Reliability of Prediction. If the correlation between Regents’ 
and semester average scores were perfect, the work of the Admis- 
sions Committee, considered above, would be simple and pleasant 
indeed. Unfortunately, the semester averages scatter about the 
regression line, so that various averages are associated with a par- 
ticular Regents’ average. If we examine Table 6.10, we note that 
the former range from about 67 to about 87 for Regents' averages 
of 90.0 to 91.9. Can the extent of the scatter be estimated and 
allowed for in prediction? 

The answer is yes, provided we are able to make three assump- 
tions. The first is that the relationship between predictor and cri- 
terion variables is linear. The second is that the scatter of criterion 
scores in one column of the correlation table is equal to the scatter 
in any other, or would be if enough cases were added to smooth 
sampling irregularities. In our present example, this assumption 
means that in the population of freshmen, past, present, and future, 
from which the 76 freshmen are a sample, the semester averages 
which are, or will be, associated with a given Regents' average will 
show the same scatter as those associated with any other. Tech- 
nically, this is known as the assumption of homoscedasticily (homo 
means equal and scedasticity means scattering). If the relationship 
is linear, of course, equal scattering in columns results in equal 
scattering about the line of regression. 

'The third assumption we must make in order to judge the re- 
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liability of prediction is that the differences between predicted and 
observed criterion scores (semester averages in our example) will 
be distributed normally in each column. This amounts to the 
assumption that the observed criterion scores will be distributed 
normally in each column, 

Under the assumptions of linearity, homoscedasticity and, 
normality of criterion scores, the standard error of estimate, Саа 
defined in the preceding section, enables us to gauge the reliability 
of prediction. In Chapter V we learned that standard deviation 
unit distances or z intervals include fixed proportions of normally 
distributed scores. Since cy.» is the standard deviation of the errors 
of estimate (differences between observed and predicted criterion 
scores), we may determine the proportion or percentage of criterion 
scores falling within оу. unit distances of their predicted value, 
and hence the chances that a criterion score, yet to be observed, 
will fall within or outside of some specified interval. 

The manner in which criterion scores will be distributed about 
the regression line, if the conditions and assumptions underlying 
prediction are satisfied, is depicted in Fig. 6.7. About 68 per cent 
of the criterion scores in any column will presumably fall within 
t loy.» of their predicted value; about 95 per cent within +20,..; 
and practically all within +30... 

Returning to the problem of predicting freshmen semester aver- 
ages from Regents’ averages, cy. computed from formula (6.17) is 


Tys = 7.63 V1 — (.65) = 5.80. 


In this example, then, the standard deviation of the differences 
between predicted and observed semester averages, or the standard 
error of estimate, is 5.80, and we may gauge the reliability of pre- 
diction accordingly. We may reasonably expect that about 68 per 
cent of the semester averages, yet to be made, will fall with +5.80 
of their predicted value; about 95 per cent within +11.60 of their 
predicted value, and so on. 

Moreover, с, enables us to determine the chances that a criterion 
score will deviate from its predicted value by more than some 
specified amount. In our example, we would predict a semester 
average of 72.86 for an applicant having a Regents’ average of 
85.0, since we would have У” = 1.00 x 85.0 — 12.14. If this appli- 
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cant is admitted, what are the chances that he will make an average 
of 60 or less? Since 60 is 12.86/5.80 or about 2.22c,.. below 72.86, 
the chances are about 1.3 in 100 or 13 in 1,000, as deduced from 
Table A, Appendix C. Interpreted in another way, if 1,000 appli- 
cants having Regents’ averages of 85.0 were admitted, 987 would 


*3дух 
99.7% 
+2оух of the 
95.4% scores 
+10,x | of the 
68.3% | scores 
of the 
^ scores. " 


Fig. 6.7. Distribution of criterion scores in columns of the 
correlation table, if the assumptions of homoscedasticity, nor- 
mality, and linearity are satisfied. 


be expected to make semester averages of 60 or above and 13 ex- 
pected to make averages 60 or below. The ordinate at 60, which 
is 2.22c,.. below the predicted value 72.86, would divide the 1,000 
averages as shown in Fig. 6.8. 

It also would be expected that about 68 per cent of the 1,000 
applicants would make semester averages between 72.86 + 5.80 
or between 67.06 and 18.66; about 95 per cent would make semester 
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averages between 72.86 + 2(5.80) or between 61.26 and 84.46; 
and so on. By use оЁо,. and Table A, it is possible to determine 
the percentage of criterion scores which would fall within or outside 
of any specified interval. 

By exactly the same procedures, we can determine the chances 
that any semester average will deviate by more than some specified 
amount from its predicted value, or that it will fall within or outside 
of a given interval. 

Although our description of the method of judging the reliability 
of prediction has been confined largely to a particular example, the 
method is perfectly general, pro- 
vided the conditions and assump- 
tions noted above are met. Inany 
situation in which Y is predicted 
from X, the standard error of 
estimate o,.., being the standard 
deviation of the differences be- 
tween observed and predicted 
values of Y, enables us to deter- 
mine the probability or chances 
of a deviation from prediction 
greater (or less) than some speci- 

È fied amount. The method consists 
scores in 1,000 expected to fall 2.22 9 5 aromas 5 
rore АННЫ errors еш essentially in dividing the devia- 
mate below their predicted value, tion by су. and referring the re- 
sulting 2 to Table A, Appendix С. 
Although the method does not make allowance for the possible sam- 
pling error in the means and standard deviations used in the re- 
gression equation, it is accurate enough for ordinary purposes, The 
more exact methods of judging the reliability of prediction are 
too involved and cumbersome for discussion here. 

Let us comment, parenthetically, upon a rather common criticism 
of statistical prediction. The criticism is based upon the fact that, 
although it may be known that a certain percentage of individuals 
for whom predictions are made will in fact make criterion scores 
below (or above) a particular point, there is no way of telling which 
particular individuals will do so. Such uncertainty, however, is by 
no means confined to statistical prediction. It perplexes all attempts 


Predicted 
criterion score 


Fig. 6.8. Number of criterion 
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to foretell whether a particular individual will succeed or fail. The 
best that can be done is to state the individual’s chances in terms 
of the proportions of comparable individuals who have succeeded 
or failed. The criticism stems from the human desire for certainty 
in a field where none exists. It is impressive only because it suggests 
that, when decisions which affect individuals must be made, such 
as admission to a college or to a vocation, the decisions should be 
made in light of as much relevant information about each individual 
as can be obtained, in addition to predicted criterion scores. 

Summary. Two conditions are involved in using the regression 
equation in prediction. The first condition is that individuals for 
whom predictions are made are similar in all relevant respects to 
those upon whose past performance the regression equation is based; 
the second, that the factors affecting the predictor and criterion 
scores remain constant, or nearly so. 

Three assumptions underlie the estimation of the reliability of 
prediction: (1) linearity of regression, (2) equal scatter of criterion 
scores in the columns of the correlation table, and (3) normal dis- 
tribution of criterion scores in the columns. The first two assump- 
tions taken together mean equal scatter of criterion scores about 
the regression line. 

The logic underlying statistical prediction and the estimation of 
the reliability of statistical prediction includes these points: 


a. Correlation between a criterion and a predictor variable has 
been observed in a sample from a specified population. 

b. The relationship will hold true in further samples from the same 
population. 

с. Hence criterion scores, yet to be observed, will in fact pair with 
predictor scores in essentially the same way as they have in 
the past. 

d. Hence the reliability of predicted criterion scores can be esti- 
mated from the extent to which past criterion scores have 
scattered about the regression line. 


Since prediction is the forecasting of criterion scores yet to be 
observed, the soundness of the assumptions can be judged only 
from past experience. Ordinarily they are considered sound if the 
two conditions of prediction are met and if, in the past, regression 
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has been linear and observed criterion scores distributed normally, 
It should be emphasized, however, that all conditions and assump- 
tions underlying prediction and the reliability of prediction should 
be continuously scrutinized in the light of the information derived 
from follow-up studies. 

In a later section we shall return to statistical prediction, con- 
sidering there the problem of predicting criterion scores from two 
or more related variables. 


Exercises 


51. Referring to the illustrative example in the text, 


а. What semester average would be predicted for an applicant having 
a Regents’ average of 80.09 

b. If 100 applicants having Regents’ averages of 80.0 were admitted, 
how many would be expected to make semester averages below 65) 

с. What are the chances that an applicant having a Regents’ average 
of 80.0 would, if admitted to college, make a semester average of 
65 or above? 

d. The chances are 95 in 100 that an applicant having what Regents’ 
average would make a semester average of 75 or above? 

e. What are the conditions and assumptions underlying the above 
procedures? 


52. In the illustrative example in the text, 


а. What semester average would be predicted for an applicant, if 
nothing were known about him except that he belonged to the same 
population as the freshmen of Table 6.10? The chances are about 
68 in 100 that the average he would make if admitted would fall 
in what range? 

b. If the additional information were available that the applicant's 
Regents’ average was 97.16, what semester average would be pre- 
dicted? The chances are about 68 in 100 that the average he would 
make if admitted would fall in what range? 

с. In what sense is the second prediction better than the first? 


53. The relationship су; =o, VI — rz, tells us that if Тау = 1.00, 
бу.= = 0; and that if ray = 0, oy. = су. What does rz, equal when 
бу.» = 0,/2? In what sense does this value of Tz, take 50 per cent of the 
“guess work” out of prediction? 

54. From the information of exr. 47, p. 281, 
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a. If 200 applicants having SAT scores of 1,000 were admitted, how 
many would be expected to make semester averages below 602 Two 
hundred applicants having SAT scores above 1,200? 

b. If an applicant has a score of 1,300 оп SAT, how sure is it that 
he will make a semester average of 75 or above? 

c. If a student makes a semester average of 85, what are the odds 
that his SAT score was 1,000 or more? (Hint: use the regression 
equation of X on Y апіс. у). 


55. In a school of dentistry it was found that scores on a chalk carving 


56. 


test were correlated with grades in a basic technics course. The statistics 
were: 
CARVING TEST X TECHNICS GRADES Y 


X = 30.0 Y = 80.0 
«25.0 rey =.60 6-10.0 


Tt was decided to require all applicants in the future to take the carving 
test. 


a. Under what conditions can the technics grades for applicants 
logically be predicted from the carying test scores? 

b. What grade would be predicted for an applicant who scored 25 оп 
the carving test? 

с. What are the assumptions under which the reliability of the grade 
predicted in (b) can be estimated? 

d. What are the chances that the grade predicted in (b) would not 
in fact be 60 or lower? 

е. What are the chances that the grade predicted in (b) would not 
be 85 or more? 

f. The chances are about 95 in 100 that the grade predicted in (b) 
would fall in what range? 


A personnel official in an industrial concern found that about 25 per 
cent of the stenographers he hired did unsatisfactory work. He decided 
to give all applicants for stenographic positions a shorthand-typing 
test. List the procedures, conditions, and assumptions necessary to 
using the test as a predictor test. 


The Interpretation of the Product-Moment Coefficient 
of Correlation 


1 66 


Such phrases as “low correlation,” “moderate correlation," and 


“high correlation" have little to recommend them save conversa- 
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tional convenience. Although the phrases will inevitably persist, 
it must be understood that the coefficient of correlation is too 
complex and too diverse in use to permit general interpretation as 
low, moderate, or high. 

The product-moment coefficient of correlation ra is a sensitive, 
abstract measure of relationship between two variables. Its limits 
аге —1.00 and 1.00. When r,, = 0, there is some degree of func- 
tional relationship between the variables. When rx, = +£1.00, the 
relationship is perfect. Since research is primarily concerned with 
relationships, correlation is an extremely useful tool, particularly 
in exploratory work. We have seen that it has three broad uses: 
(1) to determine what variables may or may not be related to a 
given variable, (2) to describe the linear relationship between two 
variables quantitatively, and (3) to predict values of one variable 
from given values of a related variable. 

The interpretation of ra, varies according to the use of r,,, al- 
though both uses and interpretations overlap to some extent. When 
the correlatives of a given variable are sought, “Тоу” coefficients 
may be of importance, because they may demonstrate relationships 
where none was believed to exist. Coefficients of zero may be of 
importance, if they discredit superstitious beliefs regarding the 
causes of a given phenomenon. We shall return to this use and inter- 
pretation of г», in connection with problems of testing statistical 
hypotheses. 

When rz, is used to describe the relationship between two vari- 
ables or to predict one from knowledge of the other, ordinarily its 
most useful and clear interpretation is based upon regression theory. 

Relations of r., to Errors of Estimate and Variances. Al- 
though. we shall consider here only the standard error in estimating 
Y from X, our remarks may readily be applied to c,,, when X is 
considered to be the dependent variable. The standard error of 
estimate cy., is, by formula (6.17), 


Cys = Cy VA = гї. 
As we have seen, т, „18 the standard deviation of the differences be- 
tween the observed values of Y and the theoretical values obtained 


from the regression equation of Y on X. It is a measure of the dis- 
persion or scatter of the observed Y's about the regression line, and 
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hence indicates the extent to which variables other than X are 
influencing У. 
By squaring both members of the above expression, we obtain 


бу. = 02 — T2) (6.19) 


in which ту „ is known as the variance error of estimate and оу as 
the variance of the observed Y's. (In general, the square of the 
standard deviation is known as the variance. See р. 143). Solving 
for rz, we get 

гї, = te. (6.20) 

ey 

(The fact that the limits of ra, are 4-1 and —1 can readily be de- 
duced from (6.20), since c; „ cannot be greater than c7.) 

We may arrive at a second expression for r2, in terms of variances. 
The regression equation of Y on X may be written у’ = rz(vy/c;)z, 
in which у’ is the predicted score in deviation form. Squaring, sum- 
ming, and dividing by №, we obtain 


since Ez?/N = cà. Now Zy'?/N is the variance of the predicted 
Scores, 1.е., the variance of Y which results from variation of X. 
Designating this variance ту and solving for r}, we һауе 

4 ч (6.21) 

From equations (6.20) and (6.21) it follows that e? = op, + су. 
This is an instructive relationship. It tells us that, when X and У 
are correlated, the total variance of Y is equal to the variance pre- 
dictable from or accounted for by X plus the variance due to factors 
other than X, i.e., the variance not accounted for by X. 

It is evident from (6.21) that the proportion of the total variance 
of Y accounted for by X is equal to the square of the correlation 
coefficient, rz,. Let us apply this fact to the data of Table 6.9. The 
correlation between arithmetic problems scores and mental age is 
.59. Hence, (.59)? or about 35 per cent of the variance of the arith- 
metic problems scores is accounted for by variation in mental age. 
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Tt follows that about 65 per cent of the variance of arithmetic prob- 
lems scores must be attributed to variables other than mental age. 

When 72, is used to determine the proportion of variance of the 
dependent variable which can be attributed to variation of the inde- 
pendent variable, it is sometimes called the coefficient of determina- 
lion and the quantity 1 — rż, the coefficient of nondeterminalion. 
It is-important to note that the interpretation of ri, in terms of 
proportions or percentages is confined to variances. 


Fig. 6.9. The scatter of scores about the line of 
regression of Y on X and about the mean of the total 
Y distribution. The ratio ту. /, is equal to 4/1 —r,,3. 

А second interpretation of ra is possible by reference to the regres- 
sion equation and the standard error of estimate, as they are used 
in prediction. From the general regression equation (6.15), we note 
that when rz, = 0, we would obtain a Y^ equal to Y for any value 
of X. In other words, if rz, = 0, knowledge of X would be of no 
help in predicting Y. If rz, = +1.00, a unit change in Y would 
accompany a ций, change in X. In this case, to know Х would be 
to know Y. 
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The value of a given гш, in prediction is reflected by the standard 
error of estimate о.і, a fact we have already made use of in judging 
the reliability of predicted scores. By an extension of the concept 
of ау. as a measure of the dispersion of the observed Y’s about the 
line of regression, we can arrive at a useful and general method of 
judging the predictive value of an ra. Formula (6.17) obviously 
may be written І 


M т = т. (6.22) 
v 


It is evident from (6.22) and Fig. 6.9 that the quantity 4/1 — Tni 
indicates the proportion of total scatter of Y (as measured by the 
standard deviation) remaining in any column of the correlation 
table, provided the assumptions of homoscedasticity and linearity 
are satisfied, and thus measures the extent to which a given rey aids 
in prediction. When r,, = 0, the ratio с, г/с, equals 1 and oy.» = ау. 
In this case the Y's scatter about the regression line as much as 
they scatter about their own mean. As |r,,| increases, the ratio 
decreases, although not proportionately. When rą = +1 the ratio 
is 0, and с, is 0. This means that the observed scores fall on the 
regression line, i.e., the observed and predicted values of Y are 
identical, so that there is no error of estimate. 

The values of 4/1 — r?, corresponding to selected values of TE 
are shown in Table 6.12. This is an instructive table. It tells us, for 


TABLE 6.12 
SELECTED VALUES OF г., AND 
CORRESPONDINC VALUES OF 


rey 1-7, [resl lcny 
.00 1.000 .80 .600 
.10 .995 .866 ‚500 
.20 .980 .900 .436 
.30 ‚954 .925 380 
.40 :917 .950 .312 
.50 .866 .975 ‚222 
‚60 .800 .990 ‚141 


.70 -714 1.000 -000 
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example, that when |r.,| = .866 the ratio of c. to с, is about .50. 
In other words, when |r.,| = .866 the dispersion of the Y’s about 
the line of regression is about 50 per cent as much as their dispersion 
about У. 

In general, the quantity 4/1 — r2, is an index of the predictive 
value of the correlation coefficient. For a practical application, we 
return to the Regents' and semester averages of Table 6.10. For 
those data rz, = .65, so that 4/1 — r2, = УЛ — (.65)° = .76. 
Hence, the dispersion of the semester averages about the regression 
line is about 76 per cent as much as their dispersion about their own 
mean. 

The quantity 4/1 — r2, is commonly known as the coefficient of 
alienalion and is designated by the letter А. The quantity 1 — А 
is sometimes called an index of the efficiency of prediction. 

Ғау аз a Measure of Rate of Change. In standard score form 
the equation of the line of regression of Y on X is 


LAC ITE e (6.23) 


Now this equation summarizes the linear relationship between Y 
and X, the two being expressed in comparable units. When we 
examine it we note that when 2, changes, z) increases, remains fixed, 
or decreases to an extent that depends entirely upon Тау. Thus rey 
indicates the amount of covariation which characterizes two vari- 
ables, when they are expressed in comparable units and when 
regression is linear. When rz, = 1.00, the covariation or functional 
relationship is perfect, and to know one variable is to know the 
other. When it is different from 0, r, indicates the rate at which Y 
changes with Х. 

Although the interpretation of r+ as a measure of rate of change 
has some application in the exact sciences, it is mainly of theoretical 
interest in educational research. 

The Effect of Variability of Data upon r.,. One of the most 
important things to keep in mind in interpreting Tz, 13 its sensitivity 
to variability in the bivariate sample. If we were to find the correla- 
tion between height and weight ina group heterogeneous in respect 
to either height or weight, the coefficient would be substantially 
greater than in a relatively homogeneous group. If we gave, say, а 


Correlation and Regression 295 


vocabulary test to a group extremely variable in mental age, we 
would find the correlation between vocabulary and mental age 
greater than in a group less variable in mental age, other things 
being equal. 

As a case in point, consider the correlation between the Re- 
gents’ averages and college freshmen averages shown in Table 6.10. 
The two are correlated with rz, = .65. Now these freshmen were 
selected partly on the basis of Regents' averages; hence, those ad- 
mitted to college were less variable in Regents' averages than New 
York State high school graduates at large. Suppose that the standard 
deviation of Regents' averages, at large, is about 10.0, as compared 
to a standard deviation of about 5.0 for our selected freshmen. 
Can an adjustment or correction be made for the effect of the re- 
striction or curtailment of variability upon rzy? 

The answer is yes, provided we can make two assumptions. The 
first is that the regression of semester averages on Regents’ averages, 
at large, would be linear; the second that the semester averages 
would be homoscedastic or equally scattered in columns. Under 
these assumptions it can be shown (Ref. 4, p. 225) that the correla- 
tion between X and Y in the unselected or larger group may be 
estimated by 


Pry(o4/o2) 
J zy Е2 т 7 6.24 
Tay УЛ = r2, + г (о/о)? ( ) 


in which ғу is the correlation between variables X and Y in the 
restricted group; c; is the standard deviation of the restricted group 
in the X variable upon which selection was made; с! is the standard 
deviation of the larger group in the X variable; and r;, is the esti- 
mate of the correlation which would exist between X and Y in 
the larger group. 

То make use of formula (6.24) in estimating r7, in our example, 
we substitute the values, ra, = .65, cz = 5.0, c; = 10.0, and have 


Р .65(10.0/5.0) 
Таб > \/1 — (65) + (.65)2(10.0/5.0)? 


The value .86 is the estimated or theoretical correlation between 
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Regents’ averages and semester averages, the effect of selection 
being eliminated. 

Formula (6.24) may be used to estimate г, for a restricted group 
if we know r,, in the larger group and the respective standard 
deviations. When used for this purpose, it is more convenient when 
written 


, , 
pcm ev rs/ o2) (6.25) 
ИКЕЛЕ Cae 

Although the correction of r+ for effect of variability is mainly of 
theoretical interest, the effect of variability upon the magnitude of 
Тау is of great practical importance in interpreting rz,. Before coeffi- 
cients of correlation observed in two or more groups can be fairly 
compared, the groups must be comparable in variability. Before 
we can generalize information based upon ап r+ observed in a given 
group to other groups, we must be sure the groups are reasonably 
alike in variability, or “range of talent." 

This means that a reported r.,, like all other statistical measures, 
cannot be interpreted without knowledge of the situation in which it 
was observed. In particular, standard deviations are needed in 
interpreting ræ and should always be reported. If the standard 
deviations are very different from those ordinarily characterizing 
the variables under consideration, the fact should be emphasized. 


Exercises 


57. Describe several ways of interpreting г., and describe a situation in 
which each way is appropriate. 

58. Describe a specific situation in which an r,, of, say, zero would be 
important. 

59. Plot the values of rz, and V1 — r2, of Table 6.12 in a graph. Interpret 
the resulting curve. 

60. What percentage of the variance of semester averages (Table 6.10) is 
accounted for by Regents' averages? What percentage is accounted for 
by variables other than Regents' averages? 

61. Given the scores below, compute the coefficient of correlation rzy and 
write the regression equation of Y on X. Predict Y' for each X. Find 
the variances and standard deviations of the Y' scores and of the 
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residuals, Y — У’. Numerically check equations (6.20) and (6.21). 
Also check the equation c? = ту + ту 


x Y 
105 78 
102 80 
99 14 
96 12 
9з 16 


62. In a relatively heterogeneous sample, the correlation between high 
school and freshmen college grades was found to be .72. If the standard 
deviation of high school grades for the group was 9.0, what correlation 
would be expected (other things being equal) in a college where selection 
on the basis of high school grades resulted in a freshman class in which 
the standard deviation of high school grades was 4.0? 


Relationships among Three or More Variables 


We have been studying simple product-moment correlation, or the 
correlation between two variables. It is sometimes the case that 
we want to examine the relationships among three or more variables. 
For example, we may want to determine the relationship of high 
school grades and intelligence test scores, taken as a team, to grades 
in college. 

There are two kinds of problems in studying mutual relationships 
among three or more variables. The first relates to determining the 
correlation between two of the variables, when the influence of the 
other (or others) is eliminated. This is the net or partial correlation 
problem. The second is that of determining the joint relationship 
of two or more variables to a third. This is the mulliple regression 
and correlation problem. Although the two kinds of problems can 
be shown to be related, we shall find it simpler to consider them 
separately. 

Partial Correlation. It is well known that if each of two vari- 
ables is correlated with a third, the relationship between the two 
variables is affected by the third. For example, if we were to measure 
the mental age and height in a group of 109 normal children, spread 
evenly over a chronological age range from 12 to 120 months, we 
should be likely to find mental age and height strongly correlated. 
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This is true because both mental age and height in children are 
strongly correlated with chronological age. As another example, 
hours of study and semester averages in high school or college are 
sometimes found to be negatively correlated. If this correlation 
were accepted at its face meaning, students who are making unsatis- 
factory marks would be advised to study less. When scholastic 
aptitude is taken into account, however, the correlation between 
hours of study and semester averages is substantial and positive. 
In studying the relationship between two variables which presum- 
ably is due or partly due to the effect of a third variable, ideally 
only individuals alike in respect to the third variable would be 
selected for the study. In the language of experimental research, 
the effect of the third variable would be controlled. This would 
mean in the examples above that only children alike in chrono- 
logical age and students alike in scholastic aptitude would be in- 
cluded in the correlation studies. 

Unfortunately, the experimental ideal is difficult to realize. For 
one thing, rigorous control on a variable may result in a very small 
sample. For another, it may be necessary or desirable to confine an 
investigation to an intact, heterogeneous group. It is frequently 
the case, particularly in preliminary investigation, that the best 
that can be done is to eliminate by statistical methods the effect 
of a third variable upon the correlation between two others. 

The product-moment coefficient of correlation between two vari- 
ables X; and X; with the influence of a third variable X; eliminated 
by statistical methods is known as the partial correlation coefficient, 
and is written гуз з. Partial correlation may be thought of as a special 
application of regression theory. We have seen that the deviations 
from the line of regression of Y on X indicate the extent to which 
variables other than X influence Y. These deviations represent 
values of Y independent of X and may appropriately be called 
residuals. When we have three variables, X;, Ху, and Xs, the resid- 
uals or deviations from the line of regression of X; on X; represent 
values of X, independent of Xz, while the residuals from the line of 
regression of Хз on X; represent values of X; independent of Хз. 
Hence, the correlation between the X, and X; residuals theoretically 
will be independent of the linear effect of Хз. 

To clarify the idea of partial correlation, let us look at the Arith- 
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metic Problems Test scores, reading test scores, and mental ages 
recorded in Table 6.13, the variables being designated Xi, X2, and 
X;, respectively. It is of interest to ask whether reading ability is 
correlated with problem solving ability, independent of mental age, 
or whether the two are correlated because each is correlated with 
mental age. The means, standard deviations, correlation coefficients, 
and the regression equations of X; on X; and of X; on Хз are 
shown at the foot of the table. Tt will be noted that these are calcu- 
lated in the usual way, although they are carried out to a greater 
degree of accuracy than the data warrant. This is done to provide 
a check on theory. 


TABLE 6.13 
ARITHMETIC PROBLEMS AND READING TEST SCORES AND 
MENTAL AGES OF 23 EIGHTH-GRADE PUPILS 
(Data from table I, appendix B, school A) 
В 2-50-22 ---- НО 
MENTAL 


ARITHMETIC READING AGE Хз 


PROBLEMS Хі Ха (момтнз — 130) Xi? xs Хи ХХ, ХХ. ХХ 


и 19 4 121 361 16 209 44 16 
26 44 50 676 1,936 2,500 1,144 1,300 2,200 
10 24 27 100 576 729 240 270 648 
22 43 34 484 1,849 1,156 946 748 1,462 
9 33 15 81 1,089 225 297 135 495 
6 29 9 36 841 81 174 54 261 
15 36 20 225 1,296 400 540 300 120 
14 25 26 196 625 676 350 364 650 
21 29 17 441 841 289 609 357 493 
16 23 11 256 529 121 368 176 253 
14 31 31 196 961 961 434 434 961 
16 33 28 256 1,089 784 528 448 924 
14 38 15 196 1,444 225 532 210 570 
10 18 5 100 324 25 180 50 90 
23 33 31 529 1,089 961 759 713 1,023 
21 33 34 441 1,089 1,156 693 714 1,122 
19 39 15 361 1,521 225 141 285 585 
11 33 15 121 1,089 225 363 165 495 
18 37 13 324 1,369 169 666. 234 481 
16 32 15 256 1,024 225 512 240 480 
16 36 31 256 1,296 961 576 496 1,16 
16 37 27 256 1,369 729 592 432 999 
14 40 40 196 1,600 1,600 560 560 1,600 
вом 358 745 513 6,104 25,207 14,439 12,013 8,729 17,704 
MEAN 15.57 32.39 22.30 + 130 
е 4.81 6.84 11.41 


а 
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TABLE 6.13 (Continued) 


By formula (6.3)— 
Correlation between arithmetic problems and reading: 


a (23 X 12,013) — (358 X 745) = .551 
v М[(23 X 6,104) — (358)2][(23 X 25,207) = (745)%) 


гї? 


Correlation between arithmetic problems and mental аде: 


(23 х 8,729) — (358 х 513) = 5 
МЇ@З X 6,104) — (358)°][(23 X 14,439) — (513):) 


газ = 


Correlation between reading and mental age: 


ыы (23 X 17,704) - (745 X 513) = .606 
“ГӨЗ x 25,207) = (745)2][(23 X 14,439) - (513) 


Гоз 


By formula (6.15) — 
Regression of arithmetic problems on mental age: 


4.81 


— — 22 
ace .248Х; - 22.20 


X| = 589 ( ) (Xs - 152.30) + 15.57 


Regression of reading on mental age: 


Х = .606 (коп) (X, - 152.30) + 32.39 = .363Х; — 22.89 


— a 


The theoretical arithmetic problems and reading scores corre- 
sponding to the given mental ages, as computed from the respective 
regression equations and the differences between theoretical and 
observed scores, or the residuals, are shown in Table 6.14. (Cf. Table 
6.11.) The residuals in the last two columns represent scores in 
problem solving and reading, respectively, theoretically independent 
of mental age, and their correlation theoretically will be uninfluenced 
by mental age. Since they are in deviation form, we may readily 
compute the product-moment coefficient of correlation between 
them by formula (6.2). The sum of products of the pairs of residuals 
is 147.1, and the sums of squares are 347.0 and 681.3, respectively, 
so that 


5 147.1 


= 1903, 
PU MIO А/6813 
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TABLE 6.14 
OBSERVED AND THEORETICAL ARITHMETIC PROBLEMS AND 
READING TEST SCORES AND RESIDUALS ABOUT REGRESSION 
LINES ON MENTAL AGE 
(Data from table 6.13) 
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OBSERVED SCORE THEORETICAL SCORE RESIDUAL 

ARITHMETIC | READING Х-ХІ X-X 
PROBLEMS X, Xa 24 X; EX 2, 

11 19 11.03 - .08 —6.75 

26 44 22.44 3.56 1.55, 

10 24 16.74 -6.74 -10.10 

22 43 18.47 3.53 6.36 

9 33 13.76 —4.76 3.26 

6 29 12:21 —6.27 1.43 

15 36 15.00 0.00 4.44 

14 25 16.49 —2.49 —8.74 

21 29 14.26 6.74 -1.47 

16 23 RETI 3.23 —5.29 

14 31 17.7. —3.73 —4.55 

16 33 16.98 - .98 —1.46 

14 38 13.7 .24 8.26 

10 18 11.28 —1.28 —8.12 

23 33 17.73 5,27 —2.55 

21 33 18.47 2.53 —3.64 

19 39 ИЗТ 5.24 9,26 

11 33 13.76 —2.76 3.26 

18 37 13.26 4.74 7.98 

16 32 13.76 2.24 2.26 

16 36 17.73 —1.78 .45 

16 37 16.74 — .74 2.90 

14 40 19.96 —5.96 1.18 

вом 358 745 358.15 was СЕИ 


the symbol тіз indicating, as previously noted, the product- 
moment coefficient of correlation between X, and X; with the 
influence of X; eliminated, or the partial correlation coefficient. 
The riz.3 of about .30, as compared to an riz of about .55, tells us 
that not all of the correlation between problem solving and reading 
can be explained by the fact that each is correlated with mental age. 
The student will be pleased to know that there is a formula 
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available which greatly simplifies the computation of гі,з, after the 
simple coefficients ri», ris, and гоз have been found. The formula, 
as derived in Ref. 7, is 


Гіз — Г13Г23 (6.26) 


Түз = — _sODF 
2 
AA — rà V1 — r$ 


When we substitute the three simple coefficients at the foot of 
Table 6.13 in the formula we have 


.551 — (.589)(.606) 
4/1- (5589)? МТ = (.606)? 


Тїз = 


Formulas similar to (6.26) for the partial correlation coefficient 
between X; and X; when X; is constant, and for АХ» апа Хз when 
X; is constant, are 


Dis — Г12Гәз " 

тыз = — -- тт! (6.26^) 
VI — rl V1 = rà 

T23 — ГізГіз A (6.26^) 


К Еа EC MEN 
УГЕ У th 


Thus, to determine а partial coefficient of correlation, we need 
only to compute the simple coefficients between pairs of variables 
and substitute in the appropriate formula. 

The partial coefficients defined above are known as first-order 
coefficients. The simple coefficients ri», etc. are known as zero-order 
coefficients. Hence, it may be said that first-order coefficients are 
computed from zero-order coefficients. 

Use and Interpretation of Partial Correlation. The chief use 
of partial correlation is that of determining what the correlation 
between two variables would be if a third variable were not inter- 
fering with the relationship. The partial correlation coefficient г.г. з 
may be thought of as a measure of the nef correlation between X; 
and X», the influence of X; being eliminated. There are two im- 
portant, assumptions underlying the technique: (1) linearity of re- 
gression of the two variables upon the third variable and (2) equal 
scattering of the values of the two variables for different values о! 
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the third variable. The second assumption is analogous to the as- 
sumption of homoscedasticity in the two-variable problem. 

Neither of the assumptions will ordinarily be fully satisfied in 
practical applications; hence, a partial coefficient of correlation 
should be regarded as a sort of average value. The coefficient may 
obscure significant relationships between X; and Ху for certain 
values of Хз. Specifically, if we had a large group of individuals 
measured on arithmetic problem solving ability, reading ability, and 
mental age, we might find the correlation between problem solving 
ability and reading ability at one level of mental age to be quite 
different from that at other levels. This is a serious limitation of 
partial correlation as a research tool, and rarely if ever can it be: 
considered an acceptable substitute for experimental control. 

In spite of its limitations, however, partial correlation is useful, 
particularly in preliminary investigations of relationships. In any 
situation in which it is logical to think of experimentally controlling 
a variable which may or may not be interfering with the relationship 
between two others, partial correlation may be used. 

Several uses and interpretations of partial correlation are called 
for in exr. 63-67, p. 319. It is suggested that the student examine 
those exercises at this time. 

It is possible to determine the correlation between two variables 
with the linear effects of two other variables eliminated. The formula 
for the partial coefficient of correlation between X; and X, with X; 
and X, constant may be written in either of the following ways: 


n Г12.3 — Г14.3Г'24.3 
12.34 Д 
2 n) 
vA 4:8 М1 Газ (6.27) 
Ti2.4 — Г1з.4Г23.4 


Г12.34 = , 
NATUR VI = ria 


711.4 being known аз a second-order coefficient. Thus, a second-order 
coefficient is calculated from first-order coefficients, and a first- 
order coefficient from zero-order coefficients. 
Partial correlation technique can be extended to eliminating the 
linear effects of more than two variables. (See Ref. 7, pp. 433—436.) 
The Regression Equation in Three Variables. The second kind 
of correlation problem involving three or more variables is that of 
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determining the regression equation of the linear relationship be- 
tween one of the variables and the other variables considered as 
a team and of measuring the strength of the relationship. 

Consider first the three-variable case in which it is desired to 
determine the equation of the linear relationship between X;, a 
dependent variable, and X, and X;, independent variables. In 
practical work, Ау might be problem-solving ability, X; reading 
ability, and X; mental age, as in Table 6.13. Or, X; might be se- 
mester averages of a group of college freshmen, X; their high school 
averages, and Хз their scores іп a scholastic aptitude test. As 
another example, X, might be measures of speed of typing, X» 
measures of finger dexterity, and А» measures of reaction time. 
The practical applications are numerous. 

If we plot Xi, Xz, and X; in a tridimensional scatter diagram 
the dots will tend to form an ellipsoid, if relationship exists. 
The equation which best summarizes (in the least square sense) 
the relationship of X, and X; to X, will be the equation of the plane 
which cuts the ellipsoid in such a way that the sum of squares of the 
deviations of X; from the plane is a minimum. This plane is exactly 
analogous to the line which best fits the dots in а two-dimensional 
scatter diagram. It is called the plane of regression of X, on Xo, Хз. 

The equation of this best-fit plane is, in raw score form, 


Ху = Bia = Xa + Вз. A X44 (я Өз 2 X» — Biss ES 5.) 
т» т 05 


3 су 
(6.28) 
where 87, ; and 8%, , are defined Ьу 
Bua = Е (6.29) 
Віз = "eye (6.29”) 


Let us apply the formula in finding the equation of the plane of 


* These bela coefficients are partial regression coefficients. The symbol 810.8, 
for example, refers to the net regression of X, on X; with X; held constant. 


In the four variable case, the symbol В1з. 34 will refer to the net regression of 
X; on X; with X; and X, constant. х 


аа add ee ee 
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regression of arithmetic problems on reading and mental age, using 
the data of Table 6.13. The essential statistics are: 


X X. Xs 
ARITHMETIC PROBLEMS READING MENTAL AGE 
X; = 15.57 X. = 32.39 X; = 152.30 
c; = 4.81 сз 6.84 аз = 11.41 
re = .551 та = .606 
гы = .589 


Substituting in formulas (6.29) we obtain 


.551 — (.589)(.606) 


Ей 
_ .589 — (.551) (4606) 
Biss = Оба .403. 


Substituting in (6.28), we have the regression equation 


dU gen seg 4.81 „ 
Xi = 307 X кр Xa 408 X түү Аз t 
zs 4.81 4. 
+ (15.57 — 307 X ggg X 3239 — 403 X түү X 15230, 


which simplifies to 
Xj = .216Х: + .170X; — 17.32. 


By means of this equation we can determine the theoretical 
values of X, corresponding to given values of Х and Хз. For ex- 
ample, when А» = 19 and X; = 134, X; is 


X, = .216(19) + .170(134) — 17.32 = 9.56. 


"The theoretical score 9.56 is the point on the regression plane corre- 
sponding to an Хо of 19 and ап X; of 134. The theoretical values X: 4 
corresponding to the 23 given values of Х and Хз are shown in 
Table 6.15. We shall consider these later in connection with the 
multiple correlation coefficient. 
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TABLE 6.15 
OBSERVED AND THEORETICAL ARITHMETIC PROBLEMS TEST 
SCORES CORRESPONDING TO GIVEN READING SCORES AND 
MENTAL AGES AND DIFFERENCES BETWEEN THEM 


READING MENTAL AGE ARITHMETIC THEORETICAL DIFFERENCE OR 


Xs X; PROBLEMS X, SCORES X| RESIDUAL, X; — Xj 
19 134 11 
44 180 26 
24 157 10 
43 164 22 
33 145 9 
29 139 6 
36 150 15 
25 156 14 
29 147 21 
23 141 16 
31 161 14 
33 158 16 
38 145 14 
18 135 10 
33 161 23 
33 164 21 
39 145 19 
33 145 11 
37 143 18 
32 145 16 
36 161 16 
37 157 16 
40 170 14 

SUM 358 


The Regression Equation in Four Variables. The equation. 
which best summarizes (in the least squares sense) the linear rela- 
tionship between a dependent variable X, and three independent 
variables X», Хз, and Х is, іп raw score form, 


[4 n 21 бі 91 
Xi 22 В12.з4 E. Xe ЕЕ В1з.24 E. Xs яг 614.23 72.” Ха 
02 03 04 


EL (s. Spee Bin XG Bun x), (6.30) 
сә 95 04 
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in which the bela coefficients are defined by 


„28 H+ Газға + Газгоагаа — Гага — ГазГоз — Palos (6.31) 


В12.з4 
L + 2rosrosrsa — 13g — Tha — Ги 
Tia + riers + Tiros 24 — Tis 34 — rios — Г14Г34 r 
Bis. = - 1 3 5 1 (6.31) 
+ 27237947 за — Газ — Га — Г 
2 
Tii + rioga + ries sa — ГааГоз — Pios — Tisas 
Bios = = (6.31”) 


l- соғымы — Dh — THe — Ги 


Although formulas (6.30) and (6.31) involve a great deal of 
computational labor they are not difficult to use. Let us apply them 
in finding the equation which best summarizes the linear relation- 
ship between first-year dental school grades and three other vari- 
ables-in a class of 146 dental school students at the University of 
Pennsylvania. The variables and the statistics, which have been 
rounded to conserve space, are recorded below. 


DENTAL SCHOOL PREDENTAL SCHOLASTIC MECHANICAL 
GRADES X» APTITUDE X; APTITUDE X, 
X, = 86.2 X; = 66.5 X, = 48.6 
в: = 6.5 оз = 12.0 о. = 8.8 
ri» = .58 тз = .42 rua = .49 
Газ = .61 Tu = .15 гы = —.04 


To find the values needed for our equation, we substitute in formulas 
(6.31) and have 


.98 .49)(.61)(—.04) + (42) (.15)( —.04) 
s А — (.42)(.61) — (.49) (15) _ .295 


Вам = 12061) 19(-44)- (61) = (15)? — (—.04)? .596 
= .39, 


.42 + (.58)(.15)(—.04) + (.49)(.61)(.15) 
в — (42)(.15)? = (.58)(.61) — (.49)(—.04) _ -118 _ 54 
шы 596 2506 ccm 
49 + (.42)(.61) (15) + (.58)(.61)(--.04) 
NES = Vo — (.58)(.15) — (42(-.4) _ .262 _ 4, 
14.23 .596 .596 {ый 


Б. 


> = "T1 СЫ XE 


„е УТЫ dan ЧӨ УНЕР ЧЕГО ЗАРА Рай 
D 


т. тт тш 
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Substituting in formula (6.30) we obtain 


13 1.3 79 
Xi = 39 X єє Xa + 20 X те Ха + M X gu X 
1.3 13 
+ (s24 — 39 хк X 86.2 — 20 Х үс X 66.5 


7.3 
— MX 88 х ше) 


m MX: + 2X5 + 36X, + 19.0. 


This is the equation of the hyperplane of regression of X; on X», Xs, 
X4 for the given problem. By use of the equation we can find the 
theoretical dental school grade of a student, given his Xs, Xa, and X; 
scores, For example, the theoretical grade of a student for whom 
Ха = 85, X, = 75, and X, = 50 is 


Хү = AM (85) + .12(75) + .36(50) + 19.0 = 83.4. 


This is the grade which would be observed Гог the given Ау, Аз, 
and .X, values, if the relationship were linearly perfect. We shall 
return to this example a little later. 

It is possible to determine the regression equation which best 
summarizes the linear relationship between a dependent variable 
and four or more independent variables. (See Ref. 6, Chap. VIII, 
for the general case.) 

Coefficient of Multiple Correlation. In Table 6.15 we have 
the observed scores X, and the theoretical scores Х;, the latter 
being obtained from the regression equation, XX, = .216X; + 
170Х; — 17.32, We might actually calculate the coefficient of 
correlation between the observed and the theoretical scores. If we 
did this, we would have the statistic which is known as the coeffi- 
cient of mulliple correlation of X, with Xs and Xs and which is 
usually denoted by the symbol Ну. 

In general, the coefficient of multiple correlation is defined as the 
product-moment coeflicient of correlation between the observed 
values of a variable X, and the theoretical values given by the 
equation of linear regression of X; on two or more other variables 
Xs Xs... . The coefficient of multiple correlation could always 
be obtained by finding the theoretical scores in X, by use of the 
appropriate regression equation and correlating these with the 
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observed scores. This, however, is not necessary, It can be shown ' 
that the coefficient А. зз... ~ is given by the relationship A 


Бүз. ш, И 


= Мтайбыза... m + тизбз.зз...» +, КЕЛТЕ (6.32) 


Let us apply (6.32) in determining the multiple coefficient of corre- 
lation of arithmetic problems Ху with reading Аз and mental age 
X; in our three-variable illustrative problem, The values of гу; 
and бз, ав computed above, are ,551 and ‚307, respectively, and 
the values of гуз and 8,22 are .589 and .403, Substituting we have 


Вуз = 551307) + .589(.103) = .638. 


By actually computing the coefficient of correlation between X, 
and X іп Table 6.15, we get .638, 

To find the coefficient of multiple correlation of dental school 
grades X, with predental grades Xs, scholastic aptitude Xs, and 
mechanical aptitude Ха, we substitute the values 


гуз = .58, ты = 42, ги = .49, 
Вам = .39, Bun = .20, Bin = 4M 


in formula (6.32) and have 


The multiple coefficient may be interpreted in the same way as 
the simple coefficient л. It is the simple correlation between ob- 
served and theoretical values of а dependent variable, Since the 
theoretical values are obtained from the regression equation com- 
prising the independent variables Xs Xs, .. . , the multiple 
coefficient indicates the extent to which variation of X, is asso- 
ciated with the joint variation of the independent variables. Thus, 
the square of R indicates the proportion of variance of X, which 
is accounted for by Xs, Xs, . . . . In our three-variable example, 
(.638)* or about 41 per cent of the variance of Arithmetic Problems 
‘Test scores is accounted Гог by variation in reading and mental age, 
and about 59 per cent of the variance by variables other than these. 
(As an arithmetic check on theory, the student can compare the 
variance of X; with the variance of X, in Table 6.15). Similarly, 

‚ we may say that (.73)? or about 53 per cent of the variance of 
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dental school grades is accounted for by variation in predental 
grades, scholastic aptitude, and mechanical aptitude, while about 
АТ per cent of the variance is associated with other variables. 

We shall have more to say about the interpretation of the multiple 
correlation and regression coefficients a little later. 

The Standard Error of Estimate in Multiple Regression. 
In an earlier section, we found that the standard error of estimate 
cy.» in the two-variable problem was merely the standard deviation 
of the differences between the observed scores Y and the theoretical 
scores У”, or the standard deviation of the residual Y's about the 


, 


line of regression of Y on X. This standard deviation was shown 
to be equivalent to су у 1 — r2,. 

The standard error of estimate in the three-variable problem is 
the standard deviation of the X; residuals about the plane of re- 
gression of X, on X», X; and is denoted by 1.23. Going back to 
Table 6.15, the theoretical values X, shown there are points on the 
plane of regression whose equation is Xi = .216Х; + .170X; — 
17.32, and the residuals X; — X are the deviations from the plane. 
The standard deviation of these residuals is the standard error of 
estimate c1.23. Here it is the standard error of estimating arithmetic 
problems scores from known reading scores and mental ages. 

When we have four or more variables, the standard error of esti- 
mate is the standard deviation of the X, residuals about the hyper- 
plane of regression on Ж, Xs, Ха, . . . and is denoted ош... 
As always in regression theory, the residuals X; — X, are the 
differences between the observed values of X, and the values which 
would obtain if X; were perfectly correlated linearly with the team 
of variables X, Хз, Ха,.... 

We could always find the standard error of estimate by actually 
finding the standard deviations of the residuals А, — X1, but this 
is not necessary. Analogously to formula (6.17), we may write 


сл. о = VL His al (6.33) 


In Table 6.15 the standard deviation of the residuals X, — X; is, 
by actual computation, 3.70. Remembering that for this problem 
c; = 4.81 and Ris; = .638, we obtain by substitution in formula 
(6.33) 

сіз = 4.81 (/1- (638)? = 3.70, 
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which is in exact agreement with the value obtained by the direct 
method. 

For the illustrative dental school data, we have о: = 7.3 and 
В.о = .73 so that the standard error of estimate is 


была = 7.3 V1 — (.73)? = 5.0. 


The standard error of estimate in multiple regression and corre- 
lation is interpreted exactly as in simple correlation. Since it is 
always based upon the differences between the observed values of 
X; and the theoretical values which would obtain if the correlation 
between X; and the independent variables were perfect, it is an 
informative and useful statistic, particularly in prediction. 

Determining the Beta Coefficients and Regression Equa- 
tions. Since the multiple regression equation is essentially deter- 
mined from the zero-order correlation coefficients and the means 
and standard deviations of the given variables, little needs to be 
said regarding computation. In summary, in the three-variable 
problem the befa coefficients are computed from formulas (6.29); 
in the four-variable problem the bela coefficients are computed from 
formulas (6.31). The regression equation for the three-variable 
problem is given by (6.28) and for the four-variable problem by 
(6.30). The coefficient of multiple correlation and the standard 
error of estimate are obtained from formulas (6.32) and (6.33), 
respectively. 

Although there are as many possible regression equations as there 
are variables in a problem, only one equation ordinarily is of interest 
in practical work, since ordinarily only one of the variables may 
logically be thought of as dependent. 

It frequently is desired to compare an His; with an fissi, the 
latter В being based upon the original three variables and one 
additional. When an additional variable is introduced or when a 
variable is dropped, the values of all of the bela coefficients will 
generally be changed, since they are based upon the interrelation- 
ships of all variables in the problem. 

Tn finding bela coefficients and in determining regression equations, 
the student is advised always to follow the models of the above ' 
formulas. It is possible, of course, to interchange subscripts in the 
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formulas, but the practice is subject to error. The advice is particu- 
larly appropriate in work involving more than four variables. 
Models for more than four variables may be found in Ref. 6, pre- 
viously cited. 

Regarding the number of digits to retain in the bela coefficients, 
it is obvious that no more should be retained than in the correlation 
coefficient. The accuracy we claimed for the correlation and bela 
coefficients in our illustrative three-variable problem is not justified 
by the data, It did serve, however, to give better arithmetic checks 
on theory. 

Uses of Multiple Regression and Correlation. The most com- 
mon application of multiple regression methods is found in predic- 
tion; in fact, multiple regression has been almost exclusively identi- 
fied with prediction in educational work. The dental school problem, 
discussed above, is a typical one. In such problems the dependent 
variable is called the crilerion and the independent variables the 
prediclors, following the usage in the two-variable problem. The 
use of the multiple regression equation in prediction is entirely 
analogous to the use of the two-variable equation. After the correla- 
tions between a criterion variable and two or more predictor vari- 
ables have been observed, the multiple regression equation is deter- 
mined. The equation is then used to predict criterion scores, yet to 
be observed, from the observed scores in the predictor variables. 

The reliability of such prediction is judged by the standard error 
of estimate, exactly as in the two-variable problem. In the dental 
school example, we would predict a grade of 86.8 for an applicant 
who had a predental grade of 90, a scholastic aptitude score of 70, 
and a mechanical aptitude score of 55, since we would have @ у= 
-44(90) + .12(70) + .36(55) + 19.0. The standard error of estimate 
attaching to the predicted score is 5.0, as determined previously, 
and we would interpret the predicted grade and other predicted 
grades accordingly. Since the assumptions and conditions under- 
lying prediction from the multiple regression equation and the esti- 
mation of the reliability of prediction are similar to those underlying 
prediction from the two-variable equation (see p. 287) no more 
_ heeds to be said about them. 

One of the most notable limitations of the use of the multiple 
regression equation in prediction is due to the possibility that a 
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high score in one predictor variable will offset a low score in another. 
This is easily seen when we note the additive nature of the right 
member of the regression equation. Since a low score in one ability 
may in itself be predictive of failure, the regression equation should 
be used cautiously and should not be allowed to obscure important 
detail. As a rule, multiple cut-off methods and profile analysis 
should be used to supplement regression methods in prediction. 
(See Ref. 8, p. 195.) 

Closely related to prediction is the problem of selecting a limited 
number of tests from a greater number which gives a maximum 
multiple correlation with a criterion. Since this problem will nearly 
always involve more than four variables, we shall not deal with it 
here. (See Ref. 8, Chap. VII.) 

А third application of multiple regression methods involves the 
residuals X; — X; about the regression plane or surface. We have 
already seen that these residuals theoretically are independent of 
the effect of the independent variables; and in partial correlation 
two series of residuals are correlated to yield a measure of nel 
relationship. Let us now consider possible uses of a single series of 
residuals. Suppose we want to study factors that are associated. with 
“underachievement”? and ‘‘overachievement” in high school or 
college, defining the first as achievement poorer than expectation, 
and the second as achievement better than expectation. Obviously, 
our first job is that of defining expectation. We could define it in a 
great many ways, but the most reasonable definition, perhaps, can 
be made in terms of regression. Let us suppose that we find that 
previous school grades, scholastic aptitude, and some other variable, 
say skill in reading, are all correlated substantially with present 
grades. We could then write the equation of regression of school 
grades X; on previous grades Х;, scholastic aptitude Ау, and reading 
test scores X4. Using the equation we could predict a school grade 
for each individual in the group given his X», Хз, and Ху, scores. 
These predicted grades would define expectation. We could now 
determine a series of residual scores X; — X1. If we select for case 
or clinical study those students having relatively large negative 
residual scores and those students having relatively large positive 
residual scores, we would have two groups, опе group comprising 
students achieving well below expectation, the other comprising 
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students achieving well above expectation. Comparisons of the two 
groups in respect to study habits, home conditions, vocational in- 
clinations, and so forth might bring out important differences. 

Multiple regression and correlation methods can be used in any 
situation in which we have a relatively large sample of individuals 
measured on three or more variables, subject to the limitations to 
be noted later. The two questions of greatest practical concern 
relate to size of sample and the number of variables which may 
defensibly and profitably be retained in a regression equation. There 
are no very good answers to the questions, and we shall have to be 
content with ways of looking at them. 

The adequacy of size of the multivariate sample must finally be 
judged by the magnitude of sampling errors, a matter which we are 
not ready to take up. Common sense considerations, however, sug- 
gest that the sample must be large. In a two-variable problem, 
involving grouped data, we must have as a minimum two means to 
establish a regression line. If these means are reliable and * well- 
spaced,” the regression line they determine will be a reasonably 
good approximation to the true regression line in the population. 
If we set 10 data as the minimum number needed to yield a reliable 
mean, interpret “well-spaced” to mean separated by at least two 
columns in the correlation table, and allow for the tendency of the 
data to pile up near the center of the distribution, we shall need as a 
minimum about 50 data to establish a dependable regression line. 

In the three-variable problem, we must have a point or mean in 
the third dimension, and hence some 30 or 40 additional scores. 
Each additional variable would require a sharp increase in sample 
size, since each added variable would introduce a new dimension 
in the correlation space. Thus, the questions of size of sample and 
number of variables to be included are related. 

We shall later see that, although significance (in the sense of indi- 
eating real relationships in a population) of multiple correlation 
and bela coefficients may be established from relatively quite small 
samples, there is so much uncertainty in regression equations based 
upon such samples that they ordinarily have little practical utility. 

It is typically the case in practical work that the sample is too 
small to justify retention of more than three or four variables in the 
regression equation. The selection of the best three or four variables 
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will usually require tryout, but to some extent the matter can be 
approached analytically. Let us look at the three-variable case. 

By a bit of algebraic manipulation of formulas (6.29) and (6.32) 
we can get 


РА ВО 
Hoo ные» ш um E e (6.34) 


From this expression two useful facts can be deduced: (1) When 
Газ = 0, Ві = гі + ris and (2) when Rio = 1,7 + ris Fra = 
ӘгізГізГез = 1. These facts help in selecting the best two independent 
variables among several for the regression equation, The combina- 
tion which yields the greatest f.» of course, results in the smallest 
standard error of estimate. (See formula [6.33].) Taken together, 
the facts suggest that when two independent variables show moder- 
ate correlation with the criterion, the two are promising if their 
intercorrelation is either low or very high. The student should 
insert arbitrary values in (6.34) until he is convinced that, other 
things being equal, either low or very high values of rs; contribute 
the most to АТ зз. (The high values are mainly of theoretical interest, 
since observed intercorrelations are rarely if ever high enough to 
work.) 

Although the four-variable case is more difficult to analyze, the 
same sort of interplay among the variables is known to exist. In 
general when correlations of the independent variables with the 
criterion are moderate, as they usually are, the most promising 
variables for the regression equation are those which show low 
intercorrelations. The interplay among variables is complex, how- 
ever, and tryout is generally advisable. 

For several reasons the writer believes that, as a rule, three- and 
four-variable regression equations are more sensible and useful than, 
equations involving more variables. In the first place, three- and 
four-variable equations are more consistent with the size of the 
sample ordinarily available. In the second place, it is difficult to 
find additional variables in the practical situation which contribute 
significantly to R. In the third place, each additional variable adds 
enormously to computational labor. As a rule, the time might better 
be spent in considering the possible nonlinear relationships and pos- 
sible unique associations between the criterion variable and the 
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individual independent variables at various points on the ranges of 
the latter. Long regression equations tend to be more impressive in 
theory than in practice. 

There are situations, however, where it is desirable to examine the 
regression of a dependent variable on more than three independent 
variables. When this is the case, Fisher's methods (Ref. 1, pp. 156/7.) 
are more flexible and elegant than those referred to above. Fisher’s 
methods are particularly useful after one has a knowledge of sam- 
pling theory. They have the advantage of permitting thorough and 
precise analysis of the multivariate problem. 

Interpretation of Multiple Correlation and Regression 
Coefficients. In the main, multiple R is interpreted in the same way 
aS г.у. It enters into the standard error of estimate in exactly the 
same way as does ra and thus, like г.,, indicates the efficiency of 
prediction. In this connection, the values of Table 6.12 may be used 
in interpreting В. 

We have noted that R? indicates the proportion of variance of the 
dependent variable which is explained by or accounted for by the 

- independent variables. This proportion of variance may be broken 
up into parts which indicate the direct contributions of the inde- 
pendent variables and the indirect contribution resulting from the 
interrelationships between the variables. This may be seen by ex- 
amining R? in the three- and four-variable cases. 

In the three-variable case it may be shown that 


Е зз = Bina sm Bis: + 2723812, 3813.2. (6.35) 


This expression indicates that the direct contribution of X, to the 
explained variance of X, is equal to 81,3; that of Аз equal to 07; 
and that the indirect contribution, resulting from the intercorrela- 
tions between the variables, is equal to 2r23319.3813.2. Let us go back 
to the arithmetic problems, reading, and mental age data to apply 
this idea. For those data, Ву» = .638, 61: = .307, Bis.2 = .403, апа 
723 = 606. Substituting these values in (6.35) we have approximately 


(.638)* = (.307)? + (403)? + 2(.606)(.307) (.403), 
A07 = .094 + 162 + 150. 


This tells us that, of the total variance of the arithmetic problems 
scores which is explained by the variation of reading scores and 
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mental age, 9.4 per cent is explained by reading, 16.2 per cent by 
mental age, and 15.0 per cent by the intercorrelation between the 
variables, 40.6 per cent in all. 

In the four-variable case А! „за may be expressed 


Rios = [jn ils Pisza ES біз + 2723812. fis. + 
2724615. з4@1а.2з + 2гзаВлз. 4614.2. (6.36) 


Here, the direct contributions of А», Хз, and X; to the explained 
variance of X, аге equal to the squares of the respective bela coeffi- 
cients, while the indirect contributions are indicated by the product 
terms. Returning to the four-variable dental school data, let us 
insert the values previously found in (6.36). We get, to two-figure 
accuracy, 


(.73)? = (.39)? + (.20)° + (.44)? + 2(.61)(.39)(.20) 
+ 2(.15)(.39)(.44) + 2(—.04)(.20)(.44), 
.53 = .15 + .04 + .19 + .10 + .05 — 01. 


Hence, the total variance of dental school grades may be analyzed 
or accounted for as follows: 


15 per cent by correlation with predental grades, 
4 per cent by correlation with scholastic aptitude scores, 
19 per cent by correlation with mechanical aptitude scores, 
10 per cent by indirect correlation with predental grades and 
scholastic aptitude scores, 
5 per cent by indirect correlation with predental grades and 
mechanical aptitude scores, 

—1 per cent by the dampening effect or interference due to the 
negative correlation between scholastic and mechanical apti- 
tude scores, 

47 per cent by correlation with variables other than predental 
grades, scholastic aptitude scores, and mechanical aptitude 
Scores, 


(Owing to rounding numbers, the percentages fail to sum to 100.) 


Care must be taken in using the bela coefficients in judging the 
relative importance or weights of the independent variables as con- 
tributors to the variance of the dependent or criterion variable. 
Let us remark emphatically that the expressions of (6.35) and 
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(6.36) indicate that the belas must be squared and that they cannot 
be interpreted without reference to the product terms. It is not 
unusual to come across research reports in which the betas (some- 
times squared, sometimes not) are taken to indicate the importance 
of predictor variables without reference to the interrelationships 
which exist. To show the absurdity of this, let us refer to our dental 
school data again. If we should examine only the squared belas 
we might conclude that mechanical aptitude is considerably more 
important than either predental grades or scholastic aptitude in 
explaining, or possibly causing, dental school grades. Such a con- 
clusion would be manifestly unsound. The relatively large inter- 
correlation between predental grades and scholastic aptitude reduces 
the contribution of each to dental grades. The proper interpretation 
is that when mechanical aptitude is combined with predental grades 
and scholastic aptitude it accounts for 19 per cent of the total 
variance of dental school grades. Like other statistics, be/a coeffi- 
cients cannot be interpreted without taking into account the entire 
situation in which they are observed. 

The multiple coefficient R differs from ra in one important re- 
spect. The former is always positive or zero, as can be deduced 
from its various definitions. Furthermore, the variance error of 
estimate от(1 — Ri; , . .) cannot be greater than o3(1 — r?) or any 
other zero order variance error of estimate. Consequently, В cannot 
be numerically less than any observed constituent zero-order 
coefficient. This means that R tends to have a positive bias due to 
chance fluctuations of the zero-order coefficients. The tendency is 
augmented by the practice of selecting the highest observed. zero- 
order coefficients in estimating В. As a result А tends to be less 
in the population than in the sample. The greater the number of 
variables, sample size constant, the more the sample R tends to 
exaggerate the population R. 

Concluding Remarks. The principal assumption underlying 
correlation analysis is that the relationship between the various 
pairs of variables are sensibly linear, i.e., that the data in the corre- 
lation tables, from which the zero-order coefficients are computed, 
are adequately described by the straight line law. The standard 
error of estimate in multiple regression assumes normality and 
homoscedasticity in certain columns or directions in the correlation 
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space; or, in other words, that the observed X;'s corresponding to 
one set of values of X», Хз, . . . are normally distributed and are 
neither more nor less variable than those corresponding to any 
other set of values. There is no satisfactory way to examine this 
assumption, but when the distributions are sensibly normal and 
the zero-order regressions linear, the assumption is warranted. 

In conclusion, since the assumptions are rarely fully satisfied 
and since multiple correlation analysis is quite susceptible to 
sampling and measurement errors, multiple correlation and bela 
coefficients must always be interpreted cautiously and with some 
skepticism, at least until they are confirmed by further samples 
from the same population. 


Exercises 


63. What are the advantages of partial correlation as a research tool? 'The 
limitations? Suggest an educational problem which might be studied 
by partial correlation methods. 

64. An investigator found that when intelligence was held constant the 
correlation between years of schooling and size of salary dropped from 
.66 to .15. He concluded that intelligence rather than years of schooling 
was related causally to salary. Comment. 

65. In an investigation of the relationship between semester averages Xj, 
hours of study per week X», and scholastic aptitude Хз, the zero-order 
coefficients were found to be г = —.10, газ = .60, г» = —.70. Find 
the correlation of semester averages with hours of study, scholastic 
aptitude constant. Interpret this net correlation coefficient. 

66. It is known that in small samples rz, fluctuates widely. In a partial 
correlation study involving a sample of about 30, the following coeffi- 
cients were observed: ri; = .20, ris = .50, rs; = .30. In a second small 
sample these coefficients were observed: ris = .40, ms = .30, rs; = .20. 
Compare the г». values for the samples. What does the comparison 
suggest? 

67. Referring to the illustrative dental school data, find the correlation 
between dental school grades and mechanical aptitude, with predental 
grades and scholastic aptitude constant. Interpret the result. 

68. In exr. 47, p. 281, the SAT scores and semester averages for 72 college 
freshmen are given. For these same 72 freshmen, Regents’ averages 
and secondary high school class rank are available in Table IT, Ap- 
pendix B. (a) Describe step by step how you would proceed to find 
the equation of regression of semester averages on SAT, Regents’ 
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69. 


70. 


7015 


72. 


averages, and class rank. (b) What assumptions would you make? 
(c) How would you find the standard error of estimate? (d) What 
assumption would you make in using the standard error of estimate in 
interpreting predicted grades? (e) Determine the equation of regression 
of semester averages on as many of the aboye variables as appear to 
be normally distributed and linearly related to semester averages. 

In the example in the text, arithmetic problems scores and mental ages 
were correlated with газ = .59. The correlation of arithmetic problems 
with reading and mental age В.» was .64. Compare the standard errors 
of estimate in the two-cases. Why does the addition of reading not add 
more to R? 

Using the regression equation determined for the illustrative dental 
school data, predict grades for applicants А, B, C, and D for whom 
ihe following data are available: 


PREDENTAL SCHOLASTIC MECHANICAL 
GRADE X» APTITUDE X; APTITUDE X, 
A 60 85 if 
B 86 66 50 
с 75 65 45 
D 95 85 25 


Using the standard error of estimate of 5.0, how sure can you be that 
each applicant will pass, if the passing grade is 70? In what sense is 
applicant A a poor risk? Applicant D? 

The three-variable regression equation is sometimes written in terms 
of partial correlation coefficients, standard errors of estimate, and 
deviation scores as follows: 


01.3 


81.2 


0 
Жү = Dana та + riso 


92.3 93.5 


Reconcile this equation with equation (6.28). 

Suggest an educational problem to which multiple regression and corre- 
lation methods may be applied, noting the assumptions, limitations, 
and interpretations of the methods as applied. 


Curvilinear Relationship; the Correlation Ratio 


When variables X and Y are not correlated, rz, is zero and X 


and Y are said to be independent in a statistical sense. It does not 
follow, however, that when rz, is zero, X and У are independent. 
Since г, measures linear relationship, an rz, of zero indicates only 
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that X and Y are not linearly related. When rz, is zero, it may be 
that the data in the correlation table tend to follow a U- or J-curve, 
or some other second- or higher-degree curve. Although the tendency 
can ordinarily be detected by inspection of the scatter diagram, it 
is useful to have a measure of the strength of curvilinear relation- 
ship. One of the simplest and most important measures of such 
relationship is the correlation ratio т (ela). 

The Correlation Ratio of Y on X. We have seen that r2, may 
be defined by 


2 
0, 
",-1- е. 
о, 
v 


Now the quantity c7 , is the variance of the deviations from the line 
of regression of Y on X and c; is the variance of the Y distribution. 
In the correlation table, the former is the variance of the deviations 
in the various columns from the predicted or theoretical means, since 
the theoretical means are on the regression line. Hence, in words, 
we may write 
variance of deviations from theoretical column means 
total variance of Y j 


y= 

Suppose now that instead of the variance of the deviations from the 

theoretical means in the columns, we have the variance of the 

deviations from the actual means of the columns. Analogously to. 

the above, we may define 72, by 

variance of deviations from actual column means 
total variance of Y Қ 


tye = 1 


In symbols, 
=]-% (6.37) 


in which nys denotes the correlation ratio of У оп X.and a, the 
variance* of deviations from column means. 

The actual computation of ту. is an aid to understanding its 
properties. We shall therefore first see how it is computed. 

* It would be somewhat more appropriate to designate this variance as the 


mean square of deviations from the column means, since it is really the 
average variance within columns. 
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formula for т, is 


» _ Муха) — Gd 
"ay М> — (3f): 


(6.39) 


It is left as an exercise for the student to show that in the above 
example 7;, = .703 so that у, = .84. 

Interpretation and Use of the Correlation Ratio. We shall 
confine our remarks to the correlation ratio of Y on X, although 
similar remarks obviously can be made about ny. 

Several important properties of ny- can be deduced from definition 
(6.37). As the ratio in the right-hand member decreases, Ти; in- 
creases. If all of the scores are at their respective column means, 
the ratio is zero, and yz = 1. On the other hand, if the scores scatter 
about their respective column means as much as they scatter about 
the mean of the total Y distribution, 7,, = 0. Thus, the limits of 
yz are 0 and 1. Sinee the sum of squares of deviations from the mean 
in a column is a minimum, ту is greater than гш, except in the 
special case when the means of the columns lie on the regression 
line. Since this never occurs in empirical data, nys in practice is 
always greater than rzy. 

Definition (6.37) also indicates that ту is not independent of the 
classification of data. As the number of classes increases, its value 
increases. If the classification were so fine that only one score 
appeared in a column, су, would be 0 and n,- would equal 1. 

In addition to being affected by changes in classification, the 
correlation ratio has a notable limitation. It gives no indication of 
the nature of the relationship, and it cannot be incorporated in a 
curvilinear regression equation. Any functional description of a 
relationship it may disclose must be made by other methods. When 
the correlation ratio is great as compared to ry, a second or higher- 
degree curve may fit the data, but the ratio is of no value in deter- 
mining the equation of the curve.* 

There are no formal assumptions underlying nys but it would 
hardly be appropriate to determine variance about column means 
if the scattering of scores differed to any great extent from one 
column to another. Hence, we may say that homoscedasticity is 


* The student who is interested in fitting nonlinear curves will find the topic 
treated in most elementary mathematical statistics texts. 
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implied. If the scores show marked differences in scatter from 
column to column, mass analysis of relationships can neither be 
logically made nor practically defended. The most important feature 
of the data would be unequal 
d scatter in columns. 
In general, nys is a measure of 
. .|.|.|.|.|.| the concentration of scores about 
" their respective column means. 
It indicates whether there is a 
tendency of the column means 
to follow some pattern, linear or 
otherwise. The more pronounced 
сю . the pattern, the greater the value 
. . . of лг. When the pattern is sen- 
[| . С sibly linear, ny- will not be very 
x different from rz. Several ideal- 
ух > Vay Пух > Глу a eren К у x: à 
ized situations and the relation- 


Fig. 6.10. Idealized patterns of ship of лу to rz, in each are shown 
means of columns in the corre- in Fic. 610 
lation table and relationship be- А 
tween ту: and rzy in each. 


Nyx=Fay=1 Nyx=Fxy=0 


The chief use of the correlation 
ratio is in determining whether 
regression may be considered linear. When the regression of Y on X 
is linear, the difference between n¿, and r?, is small enough to be 
reasonably ascribed to chance fluctuations. The correlation ratio 
is invariable in this connection, since the interpretation and use of 
Tzu generally presuppose linearity. We return to tests of linearity in 
a later chapter. 


Exercises 


73. Show by consideration of theory or by sketching a correlation table 
that the regression of Y on X may be linear, but the regression of .X 
on Y nonlinear. 

74. Find туь and yzy for the data of exr. 47, р. 281. Why is 7,2 of first con- 
cern? 

75. It is well known that the results from psychological experiments dealing 
with learning, retention and forgetting, fatigue, and so on tend to be 
nonlinearly related to either time or amount of practice, when either 
is considered as the independent variable. Find or invent a set of such 
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wr 


data and discuss their analysis. What sort of mathematical description 
of relationship would be appropriate? 


. Prove the following statement (or check arithmetically, using the data 


of Table 6.16): The sum of the squares of the deviations of all the 
values of Y from their general mean may be broken up into two parts, 
one part comprising the sum of the squares of the deviations of the 
means of the columns from the general mean, each multiplied by 
the number in the column; and the other part comprising the sum of the 
squares of the deviations of the Y’s from their respective column 
means, 
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Chapter VII 
Reliability and Validity 
of Statistical Evidence 


IT IS the aim of statistical procedures to obtain trustworthy evi- 
dence which can be used in solving problems. This aim has been 
emphasized in the preceding chapters, but as yet little has been 
said about the conditions of trustworthy evidence. 

The ultimate test of evidence, of course, is to determine whether 
the generalizations it supports are useful in prediction, i.e., whether 
they enable us to say with some degree of confidence, “If this is 
done, that will happen.” But if this test were the only one available, 
a generalization would be little better than а “try-it-and-see” 
suggestion, 

Observational evidence is subject to error, and generalizations 
drawn therefrom consequently are in doubt to some extent. It usu- 
ally is possible, however, to prevent certain errors and to make 
allowances for those which cannot be prevented. When this can 
be done, the uncertainty of the generalizations is minimized. 

It is the peculiar advantage of statistics as a tool in research that 
it provides rational methods of estimating the extent to which 
observational evidence may be in error. We shall consider these 
methods in some detail in the present and following chapters. 


The Conditions of Trustworthy Evidence 


The most important condition of trustworthy evidence obviously 
is that it be relatively free from error. Statistical data are subject 
to three important kinds of errors. Errors of the first kind are those 
which are due to chance or sampling fluctuations and are known as 
sampling errors. Sample evidence, as has been emphasized at various 
328 
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points in preceding pages, is always suspect to some extent because 
of inevitable chance fluctuations. The mean of a sample from a 
specified population, for example, will seldom agree with the mean 
of a second sample from the same population, nor will it ordinarily 
be equal to the mean of the population. 

The second kind of error infesting statistical data comprises 
errors which commonly are called errors of observation or measure- 
ment. Errors of observation are defined as random errors made in 
the application of an instrument, such as a meter stick, spring bal- 
ance, educational test, or questionnaire, in attempting to determine 
some “dimension” of an object. ‘As such, they include the approxi- 
mation errors, discussed in Chapter I, variable mistakes in reading 
the instrument, errors resulting from variable conditions which 
affect the instrument or the object; in short, all of the inaccuracies 
due to influences present during the measuring process which affect 
the results unsystematically. These errors are primary, in that 
they are inherent in all original measurements. 

А third kind of error results when there are present in the measur- 
ing process, or in the selection of a sample, systematic influences 
which prejudice the observations. Errors of this type are commonly 
called constant errors or errors of bias. Bias may affect the original 
measurements, as would be the case if a distorted instrument were 
used, or it may affect sample evidence, if it enters into the selection 
of a sample. Unfortunately, there is little that can be done sta- 
tistically about errors due to bias. They may exist unknown to the 
researcher; they may arise through carelessness in sampling and 
measuring; they may result through failure to consider all of the 
available evidence pertinent to a question. As a rule, it is extremely 
difficult to make and justify even a rough correction for bias after 
the data are gathered. The best that can be done, perhaps, is to be 
critical of the measuring process and sampling procedures and 
thoughtfully skeptical about evidence after it has been collected. 

Although there is a very real connection between errors of ob- 
servation and sampling errors, it is convenient to deal with the two 
separately. In this chapter we shall consider errors of observation 
and related topics; sampling errors, as such, will be taken up in our 
chapter dealing with statistical inference. : 

There are two fundamental questions regarding the quality of 
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original observations, or any evidence for that matter: (1) whether 
the evidence observed faithfully represents the situation it is sup- 
posed to represent or really means what it is considered to mean 
and (2) whether a second, independent observation will yield evi- 
dence consistent with the first. It is conventional to subsume con- 
siderations of the first question under the term validity, and those 
relating to the second under reliability. 

The Meaning of Validity. It is commonly said that the first 
condition of trustworthy evidence is that of validity. This condition 
is interpreted to mean that the evidence must be relevant to the 
issue it is supposed to throw light upon or that it must accomplish 
the purpose for which it is collected. If a set of historical facts truly 
picture some particular circumstance of the past, the facts are said 
to be valid; if an achievement test of, say, algebra really measures 
achievement in algebra, it is a valid test: if an intelligence test, or 
some other aptitude test, is used to predict success in academic 
work or in a vocation, it is said to be valid if it really predicts 
success. Methods of collecting valid evidence are said to be valid. 
Thus, we may talk about either valid observations or valid instru- 
ments with no loss of meaning. 

It is useful to distinguish between two kinds of validity. The first 
we shall call formal validity. To have formal validity, evidence 
must agree, in nature and in method of collection, with specifica- 
tions that are set up on the basis of prior information. For example, 
if a test is constructed in accordance with definitions, with a body 
of content, or with opinions of authorities, it may be considered to 
be formally valid. If intelligence is defined as the ability to do 
certain paper and pencil tasks, such as arithmetic reasoning, sen- 
tence completion, and word definition, a test containing these tasks 
is a formally valid test of the intelligence of the individuals for 
whom it is designed. If an achievement test is in agreement with 
the content which was taught, it is formally valid in the given situ- 
ation. И, in the opinion of mental hygienists, a check list or ques- 
tionnaire relating to individual adjustment adequately covers the 
aspects of adjustment in question, it is considered to be formallv 
valid, i 

The second kind of validity we shall call experimental. Evidence 
18 experimentally valid if it accomplishes the purpose for which it is 
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gathered. For example, if interview, questionnaire, or aptitude test 
data are gathered as evidence of individual qualifications for school 
or a job, the data are experimentally valid to the degree that they 
predict success or lack of success in school or on the job. Evidence 
gathered as a basis for revising a school curriculum in accordance 
with pupil needs is experimentally valid if it can be shown that the 
revised curriculum actually functions to meet pupil needs. The con- 
dition of experimental validity of evidence obviously is coextensive 
with the pragmatic test of the results of research, mentioned earlier. 

It will be noted that both kinds of validity are stated in terms of 
something outside of the evidence itself. Formal validity depends 
upon whether the evidence or the method of collecting the evidence 
agrees with specifications which have been set up in advance. 
Experimental validity depends upon whether the evidence fulfills 
its purpose. Thus, validity explicitly demands a criterion or criteria 
outside of the evidence. It follows that questions pertaining to 
validity are always specific. Evidence can never be said to be valid 
in a general sense. It is valid because it agrees with a particular set 
of specifications or because it accomplishes a particular purpose. 
Statements regarding the validity of evidence have meaning only 
when the criteria and the validation procedures are completely 
described. It must be kept in mind that criteria and validation 
procedures are themselves usually open to question. There are 
various ways of arriving at criteria, whether they be measures of 
academic or vocational success or formal specifications to be met. 
Hence, detailed description regarding how and why particular 
criteria are.used is an essential step in reporting research. 

There is a great deal of ambiguity and circularity in the concept 
of formal validity, perhaps even an element of medieval scholas- 
ticism. The definitions, specifications, or other criteria set up by one 
group of teachers, authorities, or experts will rarely agree with those 
set up by a second. Since the time of Galileo, the limitations of formal 
validity in research have been rather generally recognized. This fact 
is not as damaging, however, as it might first seem. If we keep in 
mind that the ultimate test of evidence is the pragmatic test, i-e., 
the demonstration that the evidence is useful in prediction and 
explanation, the principle of formal validity is a helpful one. Insofar 
as the principle enables us to eliminate guesswork and to exclude 
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irrelevant evidence or unpromising attempts to gather evidence, to 
that extent it is of value. An aptitude test or questionnaire drawn up 
according to considered specifications presumably will work better 
than one which is not. The important thing is not to confuse formal 
validity with experimental validity and not to be content with 
formal validity. 

While experimental validity is neither ambiguous nor circular, it 
does demand criteria which may be difficult to provide. The prob- 
lem of measuring success in school or on the job is never easy, nor 
is it easy to evaluate the results of a program in action. The selection 
of criteria is generally a difficult job. (See Ref. 7.) 

When evidence can be statistically correlated with criteria, the 
coefficient of correlation is customarily designated the validily co- 
efficient. Any of the correlational methods discussed in the preceding 
chapter may be used in measuring the relationship between evidence 
and criteria, provided, of course, the assumptions underlying a par- 
ticular method are satisfied. 

The concept of validity is fundamental and never to be forgotten, 
but in its present ramifications in educational research it is an 
elusive concept and one whose demands permit various interpreta- 
tions. In a last analysis, questions regarding the validity of evidence 
have to be resolved on a “try-it-and-see” basis. The writer has sug- 
gested elsewhere (Ref. 13) that the concept of ulility be substituted 
for that of validity. It would seem straightforward and intelligible 
to consider evidence as useful or not useful for a specified purpose. 

The Meaning of Reliability. Research, as a method of solving 
problems, is unique in that its results are publicly verifiable or veri- 
fiable on demand. This means simply that the results can be checked 
and verified by any competent observer. The researcher is influential, 
not because he observes something no one else can see, but rather 
because he observes something others can see when it is brought to 
their attention. It may be said that a fundamental condition of 
evidence is that of dependability or reliability. t 

The evidence of research is said to be reliable if it can be verified 
by impartial, independent observers, or if it agrees with evidence 
obtained by independent repetitions of the process by which it was 
first obtained. For example, historical evidence or legal testimony 
is considered to be reliable if there is agreement between independ- 
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ent, well-informed historians or witnesses. Educational test, inter- 
view, or questionnaire data are considered reliable if a second appli- 
cation of the procedure by which they were originally collected 
yields data which are in agreement or consistent with the original 
data. A sample statistic, such as a mean or correlation coefficient, 
is reliable to the extent that it agrees with values yielded by further 
random samples from the same population. 

'The idea of agreement or consistency between later and earlier 
independent observations, although fundamental in the general 
understanding of reliability, is unclear because “agreement” and 
“consistency” are not defined. Several questions arise immediately. 
Does reliability imply perfect agreement? If not, how much dis- 
agreement can be tolerated? Since the best evidence of the present 
may be contravened, even demolished, by the evidence of the future, 
can any evidence be said to be reliable? 

First, let us note that, since all observation is characterized by 
error to some extent, observational evidence can never be said to 
be perfectly reliable. Second, reliability as it is used in research 
does not imply “unchanging” or “changeless.” The condition of 
agreement between earlier and later observations loses meaning if 
time or some other factor extrinsic to the measuring instrument or 
process changes (or has opportunity to change) the objects under 
observation. As was previously noted, measurement is an attempt 
to determine a “dimension” of an object. If that dimension is 
changing during the period of observation, a requirement of re- 
liability obviously would be disagreement between earlier and later 
observations. 

The concept of reliability, as it is understood in research, means 
only approximate agreement between independent observations of | 
an object or event, the only causes of disagreement admitted being 
errors of observation, discussed above. Other possible causes of dis- 
agreement, such as changing dimensions of objects or events and 
errors of bias in one or all observations, are of theoretical interest 
and of great practical concern in interpreting reliability, but if we 
attempted to take them into account in a definition of reliability, 
our definition would be hopelessly complicated. Reliable evidence, 
then, is evidence which is relatively free from errors of observation. 

It remains to give meaning to the phrases "approximate agree- 
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ment” and “relatively free from errors of observation," as these 
are used in connection with reliability. Our discussion of these 
phrases will be brought into better focus if we confine it to educa- 
tional tests and test scores, although what we have to say has wider 
application. 

The Coefficient of Reliability. The conventional methods of 
estimating the reliability of an educational testing process are 
based upon correlating scores obtained by (1) applying the same 
test twice to a given group, (2) administering two parallel forms of 
a test to a group, and (3) dividing a single test into equivalent 
halves. The correlation coefficient thus obtained indicates the extent 
of agreement between the two sets of observed scores, or the self- 
correlation of the test. A product-moment coefficient of correlation 
computed from the scores obtained by procedures (1) or (2) is 
called a reliability coefficient and is denoted by ли or just rı. The 
coefficient computed from (3) is known as the half-lest or split-half 
reliability coefficient and is denoted by rı ү or just rı. 


2П 5 
Since the longer а test is, other things being equal, the more ге- 
liable it is, the reliability coefficient computed from equivalent 
halves underestimates the reliability of the whole test. There is a 
simple formula available for estimating the reliability of the whole 
test from the half-test coefficient, the Spearman-Brown prophecy 
or step-up formula, 
2n 
2 


ЕЕ zi (7.1) 
2 


By use of the formula we would obtain a reliability coefficient r, of 
179 from а half-test coefficient гі of .65, since we would have ty = 


2(.65)/(1 + .65). 

It is possible to estimate the reliability of a test n times as long 
as the test or part-test for which the reliability coefficient has been 
determined, provided the n parts are equivalent or truly comparable. 
The formula is 
т, nri 
“Tt@ ir т) 
in which r, is the estimated reliability coefficient of a test n times 
as long as the test or part-test whose reliability coefficient is гу. It 


Та 
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will be noted that when п = 2, formula (7.2) reduces to (7.1). 
Formula (7.2) can be used to “step-down” as well as “step-up” 
a reliability coefficient. If it is desired to estimate the reliability 
of a test 1/n as long as the original test whose reliability is known, 
(7.2) may be solved for т. The formula may also be solved for n, 
in case it is desired to estimate how much a test of known reliability 
needs to be lengthened to have a specified reliability. (See exrs. 
9-11.) 

Step-up and step-down procedures, as a rule, are not very satis- 
factory. This is because the condition of equivalence is rarely met. 
Equivalence implies that the scores from part-tests have equal 
means and standard deviations and that all parts measure the same 
thing. These requirements are rarely if ever satisfied in practice. 
Thus, the procedures are largely of theoretical interest. Stepped-up 
and stepped-down values should be thought of as rough approxima- 
tions awaiting experimental verification. 

Before leaving the three common methods of estimating the relia- 
bility of a test, we should note that, strictly speaking, only the test- 
retest method can be said to measure extent of agreement between 
repeated observations. Whether parallel forms or equivalent halves 
of a test measure the same thing is always debatable. The test-retest 
method, however, is at a disadvantage in measuring abilities which 
are sensitive to memory and learning effects. We shall have more to 
say about these methods later. At this point let us emphasize the 
fact that the estimates of reliability obtained by the different 
methods do not mean the same thing. The American Psychological 
Association (Ref. 1, p. 471) has recommended that a coefficient ob- 
tained by the test-retest method be designated a coefficient of sla- 
bility; one obtained from the parallel-forms method a coefficient of 
equivalence; and one obtained from the equivalent-halves method a 
coefficient of internal consistency. Whether or not the terms are 
adopted, the idea that the different methods of estimating reliability 
result in coefficients having somewhat different meaning is impor- 
tant. (See paragraphs с, 4, р. 356.) 

Correlational methods of determining the extent of agreement 
between independent observations have wide application. In any 
situation in which we have two series of observations of presumably 
the same trait or “dimension” of the individuals in a group, the 
reliability coefficient provides a quantitative statement of the 
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extent of agreement between observations. The magnitude of the 
coefficient gives meaning to the phrase, "extent of agreement." 
When the coefficient is not zero, there is some agreement between 
the observations; as the coefficient approaches 1.00, better and 
better agreement is indicated. 

'The correlational methods of estimating reliability are funda- 
mental, in the sense that all questions relating to reliability may be 
thought of as questions regarding the extent of agreement between 
observations. However, the methods lead to misinterpretations of 
reliability, unless supplemented by understanding of errors of 
measurement. This will be the topic of our next section. 


Exercises 


l. In a study to determine the number of working hours per week, a 
national professional organization sent out about 15,000 questionnaires 
to its members. About 2,500 of the questionnaires were returned, 
showing an average working week of about 48 hours. Illustrate the 
meaning of sampling, bias, and measurement errors in this situation. 
(Other, more careful, studies actually show an average working week 
of about 42 hours in this profession.) 

2. From your own experience give examples of sampling, bias, and meas- 

urement errors. 

- What seems to you to be the most serious limitation to formal validity? 

. If you were asked to rate a number of your acquaintances with respect 
to, say, honesty, upon what would you base your ratings? How would 
you determine whether the ratings were valid? Reliable? 

. A test constructor announced that he had constructed a highly valid 
intelligence test. What information is needed to give the statement 
meaning? 

. In each of the following hypothetical situations, claims to validity or 
reliability or to both are made. Study each and determine in what sense 
validity or reliability exists. Point out the flaws in the claims. 


л + 


= 


а. А researcher announced that he had constructed а perfectly reliable 
test, since each individual in a large group made the same score on 
the retest as he did on the test. 

b. A guidance counselor gave an adjustment inventory to a high school 
freshman class in the fall and again in the spring. The results were 
in close agreement, and the counselor concluded that the test was 
reliable. 


———<— - 
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8. 


10. 


ВТ: 


12. 


с. From the writings of five well-known, progressive educators, a 
researcher derived items for a check list to evaluate the extent to 
which high school curricula met the needs of youth. He submitted 
the check list to 50 progressive educators. Since the 50 agreed that 
the items on the check list measured the extent to which high school 
curricula were meeting the needs of youth, the researcher concluded 
that the list was valid and reliable. 


4. As his doctorate dissertation, a student presented “А Proposed 


Revision of the State Public School Aid Law.” He had submitted 
his proposal to a “jury” of experts. Since the experts were in sub- 
stantial agreement, he claimed that the proposal was valid. 


. The scores of 32 graduate students on a mathematics background test 


and on a statistics achievement test are shown in Table 7.3. How would 
you determine the coefficient of validity of the mathematics test as a 
predictor of achievement in statistics? How would you determine the 
reliability coefficient of the mathematics test? 

Twenty-four teachers in an elementary school were rated by two 
supervisors. If the ratings were numerical and independent, how could 
their reliability be estimated? 


. What assumptions are made when the reliability of a test is estimated 


from half-test scores? How should the test be divided? If r, = .60, what 
2 


is the estimate of r1? 

Ап achievement test is reduced to one-fourth of its original length. 
If the original test had a reliability coefficient of .96, what coefficient 
would be expected for the quarter-test, assuming the quarter-test 
comparable to the original? 

Suppose that an adjustment inventory requires 1 hr. to administer 
and that its reliability for a given group is .40. How many hours long 
would the inventory have to be to give a reliability of .90 for the group, 
assuming each hour's work equivalent? 

Describe a situation, not mentioned in the text, in which each of the 
following would be appropriate. 


Formal validation. 

Experimental validation. 

Test-retest estimation of reliability. 

. Parallel-forms estimation of reliability. 

. Equivalent-halves estimation of reliability. 

‚ Correction of a reliability coefficient for lengthening a test. 
. Correction of a reliability coefficient for shortening a test. 
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Relations between Errors of Measurement and Reliability 


As was noted in Chapter I, measurement results in a number 
which is taken to represent some property of a thing. When we 
measure an individual, we seek a numerical yalue which may be 
considered to represent the height, weight, intelligence, opinion, or 
some other property of the individual. This numerical value or 
observed score is said to be reliable if repeated observations yield 
consistent results, ander certain specified conditions. 

In order to understand the assumptions underlying reliability 
theory and to appreciate the consequences that result when the 
assumptions are not satisfied, it is necessary for us to approach 
the topic from a theoretical point of view. 

Observed Scores, True Scores, and Errors of Measurement. 
The most comprehensive approach to the concept of reliability is 
found in thinking of an observed score as representing a theoretically 
correct ог "true" value, plus an error of observation or measure- 
ment, If we let X, be an observed score, X, the true score, and Е 
the error of measurement, we may write А 


№ = Xa HE. (1.3) 


The smaller the error Е, of course, the more closely X, approxi- 
mates Xa; if there іх no error, X, = Xu. Unfortunately, we never 
know either the true score X, or the error E, but we may think of 
а true score as the arithmetic mean of a very large number of re- 
peated observations, For example, we may think of the true intelli- 
gence of an individual, as measured by a test, as the mean of a very 
large number of scores obtained by repeating the test, assuming the 
individual unchanged by the process, 

Although the concepts of true score and error of measurement 
are entirely hypothetical, we shall find them invaluable in coming to 
grips with reliability theory. It will help to clarify the concepts if 
we think about what a laboratory technician does when he wants 
to determine, say, the correct or true weight of a substance, It is 
standard practice to weigh the substance repeatedly and to take 
the arithmetic mean of the weights observed. When conditions 
which affect the weight systematically are controlled, so that the 
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errors in successive observations are truly random, the arithmetic 
mean of the observed weights is considered to be the true weight 
of the substance, (Cf. exr. 18, Chapter ІП.) Before leaving the 
example, let us note another standard laboratory practice, After 
the mean or hypothetical true weight is determined, the mean 
deviation or the standard deviation of the observed weights would 
be reported as an index of the precision of the weighing process. 

It rarely is possible in measuring a “dimension” of a human being 
to repeat the measuring process under controlled conditions, and 
thereby to estimate the true score and magnitude of errors directly, 
The measuring process may change the individual, and the change 
resulting from one trial may carry over into a second, Various other 
influences may result in an actual change of true score or in corre- 
lation between errors during successive repetitions of the process. 
Hence, we ordinarily must estimate true scores from the results of 
only one or two measurements and the extent of error in the process 
in a gross or aggregate sort of way. The fact that this can be done 
rationally is of great importance in educational measurement. 

In approaching this matter, it will be helpful to think again about 
the work of the laboratory technician. Let us now suppose that the 
technician wishes to determine the reliability of a weighing process 
or the reliability of the weights obtained by the process (these come 
to the same thing) by weighing a number of objects whose weights 
have previously been determined То a high degree of accuracy. То 
simplify the illustration let us suppose that he weighs 6 such objects 
and that the results are as shown below: 


————————————————— 


KNOWN WEIGHT. OBSERVED WEIGHT ERROR 
MICROGRAMS MICROGRAMS MICROGRAMS 
a 
10 12 2 
20 19 -1 
30 27 -8 
40 41 1 
50 51 1 
sum 150 150 0 
MEAN 30 30 0 
о? 200 203.2 3.2 
° 18 


. 
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It will be noted that the mean of the known weights is equal to the 
mean of the observed weights and that the variance of the observed 
weights is the sum of the variances of the known weights and the 
errors. This is true because the errors in our illustration are truly 
compensating and uncorrelated with the true weights. We shall 
emphasize this fact a little later. The standard deviation of the 
errors is 1.8 micrograms. Now suppose that a much larger num- 
ber of objects had been weighed and that the standard devi- 
ation of the errors had been the same. Assuming the errors normally 
distributed, the standard deviation would provide an index of the 
reliability of the weighing process and could be used to estimate 
confidence limits of the true weight of an object. The error in the 
weight of a comparable object weighed under similar conditions 
would almost certainly be less in absolute value than 3 X 1.8 
micrograms, About 95 per cent of the time the error would be less 
than 2 x 1.8 micrograms and about 68 per cent of the time less 
than 1.8, (Why?) 

We can parallel the above illustration by imagining we know both 
the true scores and the observed scores of 20 individuals, and that 
the errors of measurement are perfectly compensating and uncorre- 
lated with the true scores, If we actually knew the true scores, of 
course, we would not need to measure the individuals, but that 
thought need not detract from our development of reliability theory, 
The supposititious data for the 20 individuals are shown in Table 7.1. 

‘The student can verify that the means and standard deviations 
of true scores, observed scores, and errors are as shown at the foot 
of the table and that there is no correlation between true scores and 
errors. № will be noted that the variance of the observed scores is 
equal to the sum of the variances of true scores and errors. When 
errors are uncorrelated with true scores, this relationship is always 
true, as demonstrated in Appendix A, and we may write 


о? = o3 + 01. (7.4) 
By transposing and dividing by e; we obtain 


сі о? 
22 = zi (7.5) 


It will furthermore be noted that the mean of the true scores in 
Table 7.1 is equal to the mean of the observed scores. These facts 
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are true because (1) the errors are uncorrelated with true scores and 
(2) the errors are perfeclly compensating. The italicized conditions 
must never be forgotten, They are of utmost importance in under- 
standing and interpreting reliability. 


TABLE 7.1 
THEORETICAL RELATIONS BETWEEN TRUE 
SCORES, OBSERVED SCORES AND ERRORS 
OF MEASUREMENT 


TRUE SCORE OBSERVED SCORE Ennon 

хь А Е 

18 17 -1 

37 35 -2 

28 28 0 

31 37 6 

42 44 2 

36 36 0 

11 15 4 

32 27 -5 

24 25 1 

13 4 1 

21 4 -7 

22 21 -1 

15 18 3 

18 16 -2 

33 38 5 

27 23 -4 

26 28 2 

34 34 0 

25 22 -3 

27 28 1 

sum 520 520 0 

MEAN 26.0 26.0 0 
о? 67.3 77.6 10.3 


The Standard Error of Measurement and the Reliability 
Coefficient. We are now in position to examine the reliability 
coefficient as a function of errors of measurement. 

Since in a very real sense an observed score is dependent upon a 
true score, let us consider the regression of observed scores on true 
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scores. In Fig. 7.1 the true scores and the observed scores of Table 
7.1 are plotted on the horizontal and vertical scales, respectively. 
In the previous chapter, we learned that it is possible to summarize 
the relationship between two variables by the regression line and 
the standard deviation of the deviations from the regression line, 
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Fig. 7.1. Regression of observed scores on true 

Scores. Vertical distances of observed scores from 


regression line are errors of measurement. (See 
Table 7.1.) 


the latter being the standard error of estimate. We are not here 
interested in the equation of the regression line, since in the real 
situation we do not have true scores from which to estimate ob- 
served scores, but we are interested in the standard error of esti- 
mate. The deviations from the line of regression of observed scores 
on true scores obviously are the errors of measurement of column 3, 
Table 7.1. Following equation (6.19), we have, for the general case, 


08 = Garni 621-112) (7.6) 


Analogously to equation (6.21) we may write г = ¢2/c?. Since, 
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as demonstrated in Appendix A, the reliability coefficient ri also 
equals the ratio of the true score variance to obtained score variance, 
i.e., 


ту = 29, (т.т) 


we have r? = гу.* By substitution in (7.6) we obtain 
gi o3(1 — т). (7.8) 


The standard deviation of the errors of measurement c. is, after 
taking square roots in (7.8), 


ge = ov 1 Sis (7.9) 


If we solve (7.8) for гі we get 


n=1-% (7.10) 


a relationship also evident from (7.5) and (7.7). 

Thus, the relation of the reliability coefficient to observed score: 
and true score variances may be expressed by (7.7), and its relation 
to observed score and error variances may be expressed by (7.10), 
provided the errors of measurement are uncorrelated with true scores. 
We shall return to these relations later. 

Methods of Estimating the Standard Error of Measure- 
ment. Our discussion of errors of measurement to this point has. 
been entirely theoretical. This has been necessary in order to show 
and emphasize the relationship between the reliability coefficient 
and the standard error of measurement and the conditions under 
which the concept of reliability has clear-cut meaning. We now 
turn to the practical problem of estimating the standard error of 
measurement. If we knew rı, we could, of course, determine c, from 
equation (7.9), but for several reasons a direct estimation of ое is. 
advisable. 

We have seen that in order to estimate гі, we must have two: 
observed scores for each individual. The same is true in estimating 

* This is an instructive relationship. It tells us that the reliability coefficient 


is the coefficient of determination or the proportion of observed score variance: 
accounted for by true score variance. 
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ce. The pairs of scores for each individual may be observed by (1) 
giving parallel forms of a test or giving the same test twice, or (2) 
giving a single test which can be divided into equivalent halves. 
Let us consider case (1). For each individual we shall have two 
equations, similar to equation (7.3), 


Xi = X, TE, 
X; = Xo + Ез. 


If X; and X; are equally good estimates of the individual's true 
score X,,, we may subtract the second equation from the first and 
obtain 

Xi — Хз = E; — Es. 


This tells us that the difference between the two observed scores of 
an individual is equal to the difference between the errors in the 
scores, provided the two are equally good estimates of the indi- 
vidual's true score. The standard deviation of the series of differences 
X, — Xs may thus be viewed as the standard deviation of the 
series E; — Е». If the errors are uncorrelated the standard deviation 
of the differences E; — E» is equal to the standard deviation of the 
sums E; + E». Assuming that there is no correlation between errors, 
we may write, 


OE E. = Ох-х, (7.11) 


i.e., the standard deviation of the sums of errors in observed scores is 
equal lo the standard deviation of the differences between observed scores. 
If we now can assume that the X, and the X; series of obtained 
scores contribute equally to error variance, the standard error of 
measurement c, attaching to either X; or X; as an estimate of X,, 
will be 1/4/2* of that given by (7.11). Since 1/4/2 = .707, we 
finally have 

с. = ЛОТох,-х,. (1.12) 


Hence, to estimate c,, when we have pairs of scores observed by 
giving parallel forms of a test or by giving the same test twice, we 
find the standard deviation of the differences between scores and 
multiply by .707. In finding the differences, the X;'s may be sub- 


* Halving the variance results in a standard deviation 1/ М? times the origi- 
nal, as can easily be demonstrated. 
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tracted from the Ху or the Х/8 from the X;'s, but the subtraction 
must be consistent and the signs of the differences regarded. Ideally, 
of course, the sum of the differences is zero, but practically this is 
never the case. Otis (Ref. 9, p. 250) appears to have been the first 
to point out the relationship in (7.12). 

The procedure in estimating ¢, from half-test scores is essentially 
the same as the above. Given the half-test scores ) 


- d 
X MSS Xs + Ei, 
a we 5 
2 Ша 
ХІ = 5 Xs Er 
п п 
for each individual, we obtain by subtraction 
Хі- ХІ = Е! — Er. 
2 п 2 п 
Since the standard deviation of the differences between errors is 
equal to the standard deviation of their sums, provided the errors 
are uncorrelated, we may write 
бЕІКЕТ = ©хү—Хү+ 
еті са 1 
Since the sums Ёз + Ex are in fact the errors іп the total observed 
2 п 


scores, we have 


Ge ох кт. (7.13) 
2H 


Hence, to estimate c, from half-test scores, we need only to find the 
standard deviation of the differences between half-test scores. In 
finding the differences we must subtract consistently and must 
regard the signs of the differences. Rulon (Ref. 11) appears to have 
been the first to point out the relationship in (7.13). 

The half-test method of estimating c, is of wide usefulness. Let us 
illustrate the method using a set of real data. The scores of Table 
1.2 were obtained by giving a mental test to 36 college students. 
The half-test scores (based on odd and even items of the test) and 
the total scores are shown in the first three columns of the table. 
The differences between half-test scores are shown in the fourth 


TABLE 7.2 
HALF-TEST SCORES AND TOTAL SCORES OF 36 COLLEGE 
STUDENTS ON THE HENMON-NELSON TEST OF MENTAL 
ABILITY, FORM A 


----------------------------------------------- 


ODD ITEMS EVEN ITEMS ODD + EVEN ODD — EVEN 
xi Xr 9 х= Xr 
2 ii à п 
26 27 58 ic 
26 20 16 6 
30 27 57 3 
18 21 39 -3 
25 19 14 6 
35 28 63 1 
21 21 42 0 
22 23 45 571 
25 25 50 0 
28 24 52 1 
31 23 54 8 
28 28 56 0 
25 24 19 1 
33 34 67 -1 
24 28 17 1 
21 25 16 —4 
26 27 53 -1 
26 2 5: —1 
28 23 51 5 
2 26 50 -2 
24 23 47 1 
34 м 68 0 
34 36 70 -2 
17 15 32 2 
27 29 56 -2 
39 38 77 1 
31 26 57 5 
21 23 “ -2 
34 29 63 5 
21 24 45 -3 
30 29 59 1 
18 24 42 -6 
28 31 59 -3 
38 30 68 8 
26 28 54 -2 
21 20 п 1 
ХХ 7860 7 7 — А a 
EX? 26,943 25,052 103,527 463 
of 29.88 22.77 93.19 12.12 
а 5.47 4.77 9.65 3.48 
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column of the table. The standard deviation of the differences is 
3.48. This is the c, attaching to the total scores. We shall see what. 
we can do with it in the next section. At this point, let us use our 
data to check the theoretical relationship between г; and oe. 

We can determine the half-test reliability coefficient. from the 
half-test scores. The student can verify that ry» = .78. Using for- 
mula (7.1) the coefficient of reliability for the whole test гу is .88. 
The standard deviation of the whole-test scores is 9.65. Substituting 
in (7.9), we have 


с, = 9.65 МТ — .88 = 3.34. 


That the value of ø, as computed directly is larger than its value as 
determined by (7.9) is due to the fact that ст is not equal to ст. 
2 п 


That the values are in good agreement is due to the fact that the 
two standard deviations are nearly equal. When the standard devia- 
tions of two series of paired observed scores are equal, the direct 
and indirect methods of determining c, give identical results. When 
the standard deviations are markedly unequal, о, computed by 
formula (7.9) is a gross underestimate of the extent of error in the 
measuring process. Let us illustrate this important point numerically. 
The two series of half-test scores below have unequal standard 
deviations. 


x X Xi Xt Х-ХІ 
2 п 2 1 2 п 
2 2 4 0 
3 6 9 -3 
4 4 8 0 
5 8 13 -3 
6 10 16 —4 
SUM 20 30 50 -10 
MEAN 4 6 10 —2 


As determined from the difference column, c, is about 1.7. Correlat- 
ing the half-test scores we obtain гуз = .90, which steps up to 
about .95. The standard deviation of the total scores is about 4.1. 
Substituting іп (7.9) we get о, = .90. (If we determine гі from (7.9), 
given с, = 1.7 and c, = 4.1, we will obtain гі = .83). 

When the standard deviations of the observed scores are unequal, 
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no estimate of error can be logically defended, but the one deter- 
mined directly from the differences between scores is at least a safer 
criterion to use in interpreting observed scores. Let us point out 
again, however, that when the assumptions of equivalent tests or 
half-tests and random errors of measurement are unsound, no state- 
ment regarding reliability has clear meaning. 

Summary. The reliability theory and estimates that we have 
discussed rest upon two major assumptions: 


a. That the observed scores from parallel forms, test-retest, or half- 
test administration are comparable measures of the same thing, 
i.e., are truly equivalent except for errors of measurement. 

b. That the errors of measurement present in observed scores are 
random, i.e., compensating and uncorrelated with true scores 
or with themselves. 


When the assumptions are satisfied, the standard error of meas- 
urement and the reliability coefficient are related as in equation 
(7.9). Ordinarily the assumptions are not fully satisfied, and it is 
preferable to estimate о, directly from the differences between pairs 
of test scores or pairs of half-test scores rather than by use of (7.9). 
When the assumptions are poorly satisfied, any estimate of reliability 
is cloudy in meaning. 

Tn the next section we shall consider the interpretation and use of 
estimates of reliability. 


Exercises 


13. Criticize the statement, “Errors of measurement in educational testing 
are not unlike errors of measurement in the physical sciences." What 
is a fundamental difference between the two? 

14. In estimating reliability by correlational methods, we correlate scores 
whose components are considered to be true score plus random error. 
Hence, the sum of cross products in deviation form, Ххх, may be 
expressed, Erix: = E(x + е1) (=, + ез). Show that if the errors are 
correlated with themselves or with true scores, the correlation coeffi- 
cient overestimates reliability. 

15. With reference to stability of true score, correlation between errors, 
and equivalence of measures, compare the three common methods of 
estimating reliability. 

16. Compare the direct and indirect methods of estimating c+. 
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17. How would you estimate c. for the mathematics test scores of Table 7.3? 

18. Suppose you gave two equivalent forms of a test or gave a single test 
two times to a group. How would you estimate reliability ? The standard 
error of measurement? 


Interpretation and Use of Estimates of Reliability 


The two hypothetical questions that come up in interpreting the 
reliability of an observed measure are: (1) If the observations were 
repeated a large number of times, would the results be in agreement 
to an acceptable degree? (2) Is it reasonable to suppose that the 
mean of the measures would approach the “true dimension" as 
more and more measures were taken? 

The laboratory technician ordinarily can deal with the questions 
in a direct way. He can repeat the measurements on an object and 
take as many values as he needs to determine a "true value" to 
within a specified degree of precision. The measuring process ordi- 
narily does not change the object, and independence between suc- 
cessive observations can be maintained so that errors tend to be 
random. In this case both the estimation and interpretation of 
reliability are straightforward and convincing. 

In educational measurements these questions have to be ap- 
proached indirectly, under circumstances which make it difficult to 
interpret the answers. Rarely can there be assurance that the meas- 
uring process is not changing the abilities being measured to some 
indeterminable extent, and that errors in successive observations 
are independent. 

In this section we shall consider some of the complexities that arise 
in interpreting and using estimates of reliability in educational 
measurements. We shall find that reliability can at best be inter- 
preted only with reference to a particular group, instrument, 
experimental situation, and use of the measures. 

The Use of c, in Interpreting an Observed Score. Both c, 
and гу are measures of the reliability of obtained scores. The re- 
liability coefficient гу is an abstract measure and may be used to 
compare directly the reliabilities of two or more tests or measure- 
ment processes. It has little further usefulness. 

Since c, is a denominate number, being expressed in the unit of 
the original measures, it can be used in judging the reliability ofa 
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single observed score, provided two assumptions are satisfied. The 
first assumption is that the errors are independent and distributed 
normally, і.е., that the errors are truly random, The second is that 
the errors are homoscedastic or scattered equally for the various 
observed scores. When the assumptions are sound, ce, being the 
standard deviation of the errors, enables us to judge the reliability 
of the observed scores. To illustrate, с, for the mental test scores of 
Table 7.2 is 3.48. This tells us that the whole-test scores in the third 
column are almost certainly within 3 X 3.48 or about 10.4 points of 
the true scores they represent; that about 95 per cent of them are 
within about 7.0 points; and that about 68 per cent are within 
about 3.5 points. (Why?) Hence, the chances that a single score in 
the table deviates from the true score it represents by more than 
10.4 points are about .6 in 100; that it deviates by more than 7.0 
points about 5 in 100; and so on. Since c, is the standard deviation 
of the distribution of errors, we may use it and the area relation- 
ships of Table A, Appendix C, to establish confidence limits of any 
true score, or to determine the chances that a true score is above or 
below some specified value. 

Regarding the assumptions underlying the use of ø., both can be 
examined by means of the differences between whole-test or half- 
test scores. These differences, being differences between errors, will 
be distributed normally if the errors are distributed normally and 
are independent. The check works only one way, since normally 
distributed differences do not necessarily mean normally distributed 
independent errors, but it appears to be sufficient, practically 
speaking. The second assumption may be roughly checked by in- 
spection of the differences. A better check, however, is possible by 
comparing the standard deviations of sets of differences at various 
observed score intervals. In Table 7.2, for example, the standard 
deviation of the differences corresponding to scores above 60 might 
be compared to the standard deviation of differences corresponding 
to scores below 50, or between 50 and 60. Obviously, if marked 
inequalities exist, it would be improper to report a single standard 
error of measurement. 

It has been the writer’s experience that the assumption of homo- 
scedasticity of errors frequently is questionable. Practically, the 
assumption means that a test or measuring instrument must be 
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equally accurate throughout the range. When the range is relatively 
great and the sample is large, it is rather unusual to find the assump- 
tion clearly acceptable. In the next chapter, we shall return to this 
matter in connection with tests of homogeneity of variance. 

Real Reliability and Estimates of It. It is helpful in interpret- 
ing reliability and in understanding the consequences of unreliable 
data to distinguish between reliability and estimates of reliability. 
Unless we make the distinction we are apt to fall into the rather 
common mistake of assuming that circumstances which do not 
affect estimates of reliability do not affect real reliability. 

The fundamental purpose of measurement is to determine a 
“true dimension” of an object. If the determination is good, real 
reliability exists, and estimates of it will confirm the fact. Unfor- 
tunately, however, estimates of reliability may be satisfactory, yet. 
the determination very poor. 

When we have a set of measures of an ability in a group, we ordi- 
narily use them to do one or more of the following things: 


a. To distinguish between the individuals in the group. 

b. To determine the “true” mean or some other “true” summary 
statistic for the group. 

c. To estimate the “true” amount of the ability possessed by an 
individual. 


Constant or bias errors do not interfere with the first use. For 
example, test, scores which are sytematically too large or too small 
will affect all individuals in the same way and relative standing 
will not be distorted. Although errors of measurement obviously 
interfere, the damage will not be serious if the differences between 
the individuals are relatively large as compared with the magnitude 
of the errors. 

The effect of sample variability or “range of talent" upon the 
reliability coefficient of an instrument is easily examined, provided 
we can assume that the instrument works equally well over the 
entire range. If this assumption is true, the standard errors of 
measurement over two different ranges will be equal, and we can 


write 
г = см1 т, 
0, = 03 Vl — Гг 
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Equating and dividing we obtain 


si КЕД (7.14) 
CE oV ESSA 


in which г» is the estimated reliability of an instrument ог measuring 
process, having a reliability rı in a group in which the standard 
deviation is с, when applied to a group in which the standard 
deviation is сз. 

То illustrate the use of formula (7.14), suppose that a test shows 
a reliability of .84 when applied to a group in which the standard 
deviation of the observed scores is 12.0, and that we want to esti- 
mate the reliability of the test for a group which would have a 
standard deviation of 8.0. The values for substitution аге с: = 12.0, 
rı = .84, and оз = 8.0, so that 


Solving for r, we get .64. 

Formula (7.14) can, of course, be used to estimate the effect of 
an increased range upon reliability, but the assumption upon which 
it rests is somewhat less plausible in this case. Аз a matter of fact, 
the assumption appears to be rarely clearly acceptable. The chief 
value of (7.14) is that it emphasizes the sensitivity of reliability 
{о variability in the group. Thus, the logic supports the common 
sense notion that when the differences between individuals are 
relatively large, it is not difficult to distinguish between the indi- 
viduals reliably. 

Regarding the use of measurements to determine summary sta- 
tistics of a group, random errors theoretically do not affect the 
mean, since they tend to be compensating, but they do inflate 
the standard deviation. These facts can be deduced from equations 
(7.3) апа (7.4). On the other hand, constant errors do not affect 
the standard deviation, but they do affect the mean. Since means 
and standard deviations ordinarily are used together, both constant 
and random errors cloud interpretation to some extent. 

Random errors of measurement decrease or attenuate the coeffi- 
cient of correlation between two variables. It can be shown that 
the “true” correlation coefficient rew of variables X and Y, esti- 
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mated from the observed coefficient rzy, is 


Wee е 
cts 
Vis Мт, 


in which r, is the reliability coefficient of the measures of X and г, 
the reliability coefficient of the measures of Y. This correction of an 
observed г, is known as correction for allenualion, the coefficient 
r4, being the hypothetical true correlation between X and Y if 
perfect measures (perfect in the sense of freedom from errors of 
measurement) of both were available. It will be seen that the cor- 
rection is really quite fanciful. The relationship in (7.15) is pri- 
marily useful because it brings out the effects of errors of measure- 
ment on correlation and thus re-emphasizes the great need for 
reliable measures in research. 

Both errors of measurement and constant errors hamper the 
attempt to estimate the true ability of an individual. The former 
can be allowed for by interpretation of an observed score in terms 
of ce, so that no practical damage is done. There is no satisfactory 
way of dealing with the second. When measurements are used to 
estimate true abilities, the estimates may be seriously misleading 
if either constant or correlated errors are present. 

The various considerations above suggest several points to be 
kept in mind in interpreting reliability. Since we have only esti- 
mates of real reliability, since these estimates are based upon 
assumptions unlikely to be satisfied, and since the consequences 
of unreliable measures vary to some extent with the uses of the 
measures, practical questions regarding reliability usually are not 
entirely answerable. Г 

Estimates of reliability may or may not indicate the real reliabil- 
ity of the observed measures. Of far greater importance than esti- 
mation of reliability is consideration of types of errors. Constant 
errors do not affect estimates of reliability, but they may be dis- 
astrous in group comparisons. There is little doubt that part of 
the measured differences between socioeconomic and racial groups 
has been due to constant errors arising from differences in motiva- 
tion. (See Ref. 3.) Unless there is some assurance that a measuring 
process is relatively free from constant error, we cannot reliably 
estimate either a true mean of a group or a true score of an indi- 


(7.15) 
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vidual in the group. Inseparable from constant errors are changes 
in the true amounts of ability in the group. When reliability is 
estimated by reapplication of an instrument after a considerable 
period of time, or when reliability of measures is estimated at 
different times during the progress of a learning experiment, it is 
impossible to tell whether there is a constant error or a change in 
ability. So far as estimates of reliability are concerned, the dis- 
tinction between constant error and change is not important; as 
regards real reliability it is vital. 

Correlation of errors with true scores and with themselves not 
only makes interpretation difficult, it destroys the foundation upon 
which reliability theory rests. Although it is not possible to deter- 
mine whether errors are correlated through examination of observed 
scores or their differences, it is not difficult to identify situations 
which invite correlation between errors. In general, when there is 
present in the testing situation or situations any factor which 
causes the observed scores for each individual to be consistently 
either above or below the corresponding true score there is corre- 
lation between errors. Restrictive time limits on tests when speed 
is not considered part of the ability tested, fatigue, failure to under- 
stand directions when this is not part of the ability tested, emotional 
strain, exceptional motivation, cheating, and distractions are 
examples of factors which tend to operate to bring about correlation 
between errors. 

In passing, let us note that observed measures make the individ- 
uals in a group seem more different than they are, since there are 
always errors of measurement present. The extent of the exagger- 
ation, on the average, is seen in the equation гу = с/с. If an 
instrument is characterized by a reliability coefficient of, say, .6, 
the true variance of the group is only .6 of the observed variance. 
Practically these facts are of little value, since we are limited in 
our analyses to the measures we can obtain, but they add depth 
to interpretation. 

Advantages and Limitations of the Common Methods of 
Estimating Reliability. The test-retest method is the only one 
of the three common methods of estimating reliability which meets 
the requirement of repeated measurement, but it is almost sure to 
be perplexed by constant error as well as varying changes in “true” 
scores due to learning and memory effects. The magnitude of the 
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errors probably is directly proportional to the complexity of the 
ability under measurement. It would thus seem desirable to restrict 
the method to measures of the simpler abilities, such as motor co- 
ordination, reaction, sensation, and simple elements of perception. 

Theoretically, the parallel-form method is the most generally 
applicable and sound. When two equivalent forms of a test are 
available and can be administered within a period of, say, not less 
than a day or more than a week, the prerequisites of reliability 
estimation are usually best met. Moreover, since parallel forms 
double the sampling of content or test items, the reliability estimate 
better reflects the actual correlation of obtained scores with true 
scores and the actual error of measurement. Practically, however, 
it has several limitations. In practice it is often extremely difficult 
to construct a test of desired length and then to construct an 
“equivalent” test. The attempt frequently results either in a test 
containing items so similar to the first that the method reduces 
essentially to test-retest, or in a test containing items so dissimilar 
that equivalence patently does not exist. As another limitation, 
the amount of time needed for the determination of reliability is 
doubled. This can be a real obstacle, particularly in schools and 
colleges. 

It would seem that much of what is good in the parallel-form 
method can be had in the method of equivalent halves, provided 
that administration of the half-tests is separated by, say, not less 
than a day or more than а week. The time separation is desirable to 
lessen the likelihood of correlation of errors and to take into account 
the normal day-to-day variation in individuals. This is essentially 
the parallel-form method, but less ambitious in scope. It would 
require only that the test or inventory which is to be used be repro- 
duced in halves rather than as a whole and that two half-testing 
periods be scheduled on separate days instead of a whole period at 
one time. 

The chief limitation to the equivalent-halves method, when both 
half-tests are applied during a single period, is the danger of corre- 
lation between errors. № is well known that stepped-up half-test 
estimates of reliability tend to be higher than estimates arrived at 
by other methods, particularly so for speeded lests. It is sometimes 
said that the method is indeterminate because there are many ways 
of splitting a test into halves. While this is true, it is not particu- 
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larly important. All methods of estimating reliability are indeter- 
minate, in the sense that they are attempts to approximate some- 
thing that is never known, the real reliability of the test. 

The great advantage of the equivalent-halves method is its 
convenience. In the practical situation, it is frequently the best 
that can be done. Ordinarily the best way of splitting a test into 
halves is to construct the whole test so that the odd-numbered 
items make up one half-test and the even-numbered items the 
other. Ideally this would be done on the basis both of editorial 
study of the items and experimental tryout. 

Reporting Reliability Data. It should be clear that a great deal 
of information is needed to interpret reliability estimates. While 
the minimum amount of information needed varies to some extent 
with the use that is made of the measures whose reliability is of 
concern, as a rule the following points should be considered in 
reporting research: 


a. The group of individuals: Needed information includes specifica- 
tion of the population, description ofthe sampling procedures, 
size of the group, and variability of the individuals. Any unusual 
characteristic of the group which might affect the reliability 
estimate and the use of the instrument or process in further 
samples should be noted. 

b. The testing or experimental situation: In this connection a descrip- 
tion of all factors in the situations or tests which may give rise 
to correlated errors, constant errors, or changes in “true scores” 
is needed. Special attention should be given to any unique or 
unexpected factors. 

c. Methods used in estimating reliability: There are a great many 
ways of estimating reliability, and no one way is “best” even 
in a given situation. The researcher should describe the method 
used and tell why it was considered appropriate and what the 
estimate means. 

d. Equivalence of parallel forms or half-lesis: Needed data include 
means and standard deviations of the two or more series of 
observed scores and a note regarding similarity of content of 
tests or half-tests. 

e. Errors of measurement: The total distribution of differences be- 
tween scores or half-test scores and the distributions at several in- 
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tervals of the range may help support the assumption of normally 
distributed errors uncorrelated with true scores. In support of 
the assumption of homoscedasticity of errors, standard errors of 
measurement at several intervals of the range should be reported. 
(In a small sample, of course, this information is of little value.) 


Such information will not only be of great value to the reader, 
but will accomplish perhaps even a greater service. Knowing in 
advance that the information is needed, the researcher will himself 
deal more effectively with the reliability issues in his study than 
he otherwise would. 

Concluding Remarks. Since the assumptions underlying the 
estimation of reliability are rarely fully satisfied in practice, all 
interpretations of reliability should be cautious, and the use of 
reliability statistics should be accompanied with some misgivings 
until empirically verified. There are no statistical techniques to 
take the place of judgment and common sense in interpreting re- 
liability. This is particularly true in educational testing, where 
qualitative matters, such as content of the test and opportunity 
of the individuals in the group to have the common experiences 
presupposed by the test, may affect real reliability. 

We have confined our discussion of the interpretation of relia- 
bility mainly to the relatively narrow field of educational testing. 
The interpretation of the reliability of questionnaire, rating scale, 
score card, historical evidence, and so on usually involves other 
complexities and greater uncertainties. The underlying question, 
however, is the same, namely, what reasons are there for supposing 
that repeated, independent observations will yield approximately 
the same results. The difficulty of the question must not be allowed 
to detract from its importance. It is the central question in all 
research. There is no escape from the fact that the only use unre- 
liable evidence permits is the demonstration of its own unreliability. 


Exercises 


19. List at least three points that should be kept in mind in interpreting 
reliability. Which do you believe the most important? 

20. Criticize each of the following statements. What are the qualifications 
needed to make each statement true? 
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24. 


25. 


26. 


21. 


28. 
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a. A test having a reliability coefficient as low as .5 can be used to 
measure reliably mean differences between groups. 

Ъ. An observed score at any place on the scale of scores can be inter- 
preted in terms of ое. 

с. Constant errors do not affect the reliability of a test. 

d. The reliability coefficients of two tests permit comparison of the 
reliability of the tests. 


. Not infrequently a statement like the following is encountered in 


research reports, “Although the Blank Inventory has little demon- 
strable reliability it is the only instrument available.” What do you 
think of such a statement? 


It is sometimes suggested that an observed score be thought of as a 


zone or band on the scale of scores, rather than as a point. What is 
the merit of the suggestion? 


. The standard error of measurement of a set of ratings ranging from 


1 to 10 is found to be 2.5. Approximately what are the chances that a 
single rating is within 5 points of the true rating it represents? 

The standard error of measurement of the total scores in Table 7.2 is 
about 3.5. Which of the observed scores represent true scores which 
are very probably above 50? Which represent true scores which are 
very probably below 50? Interpret “уегу probably” as you wish. 
Given the data N = 1,000, с, = 12.0, с, = 5.0: 


a. How many of the observed scores fall more than 5 points from the 
true scores they represent? 10 points? 

b. If an individual has an observed score of 75, how sure can you be 
that his true score is not 70 or below? 80 or above? 

c. What are the assumptions upon which your answers are based ? 

d. What is the value of ri? 


А test applied to a group in which the standard deviation of the ob- 
served scores is 15.0 has a reliability coefficient of .96. What coefficient 
would you expect to observe if the test were given to a group character- 
ized by a standard deviation of 5.0? What assumption underlies your 
estimate? 

Suppose that the reliability of college grades is about .50 and the 
reliability of high school grades about .60. If the observed correlation 
between the two is .60, what would the theoretical correlation be if 
both were perfectly reliable? Why is this information of no practical 
value? 

The stepped-up half-test coefficient of reliability of the mathematics 
test estimated from the data of Table 7.3 is about .96. The standard 
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deviation of the whole-test scores is about 6.7. What coefficient would 
you estimate for the test if used in a group in which the standard devi- 
ation is 4.0? 


Test Item Analysis 


Tn present-day testing, analytic studies of reliability and validity 
usually begin with the individual items in the test. This sort of 
study commonly is known as item analysis. There are a great many 
complexities in item analysis, and we shall consider here only those 
aspects to which the statistical concepts we have developed in 
previous chapters are applicable. For a comprehensive treatment of 
item analysis and related topics, see Ref. 7, Chap. IX, and various 
pages listed in the index of that reference. We shall begin our brief 
discussion with the item analysis chart. 

The Item Analysis Chart. The item scores, half-test scores, and 
total scores on a mathematics test and the total scores on a statistics 
achievement test of 32 students are shown in Table 7.3. Correct 
responses to the items are scored “1,” incorrect responses “0.” 
Conventional item analysis always begins with items scored in this 
way. The item scores in the body of the table are the basic data for 
item analysis. All of the information we can obtain from the mathe- 
matics test, except that which might result from analysis of incor- 
rect responses, is available in the item scores. 

We are now ready to consider the various statistics which are 
used in test item analysis, but first let us note, parenthetically, that 
in practice extensive item analysis would not ordinarily be war- 
ranted by a sample as small as 32. 

Item Difficulty. There are at least three plausible ways of esti- 
mating the difficulty of a test item. First, as a matter of judgment, 
we might rank a number of items from easy to difficult, or estimate 
a given item more difficult than a second, When items have not been 
tried out, this obviously is the only way of estimating their diffi- 
culty. As a second way, we might estimate difficulty in terms of the 
average time needed to complete an item, the greater the time re- 
quired the greater the difficulty. This method has a great many 
practical disadvantages and, at present, is of theoretical interest 
only. 

The most useful way of estimating the difficulty of an item is in 
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terms of the proportion of examinees who respond correctly, the 
smaller the proportion the more difficult the item. Inspection of the 
proportions at the foot of Table 7.3 indicates that item 27 is the 
most difficult for the group and item | least difficult.* 

There are several advantages, both practical and theoretical, 
in defining difficulty in terms of proportions succeeding. When 
items are scored 1 or 0, the mean of the total scores is equal to the 
sum of the difficulties. Thus, the sum of the p’s at the foot of Table 
7.3 is equal to the mean of the total test scores. Moreover, the 
variance of an item is a function of difficulty. This is easily shown. 
The sum of squares of the scores on a given item i is 


Рио: 4 0? 4 +0? =pN, 
and the sum of scores is 
ТТТ. - +0+0+0= pN. 


Substituting these values in formula (4.8), p. 145, and squaring we get 


т 2 
с? = № [NpN — (pN)?] 


Thus, the variance of an item depends upon difficulty. Item уагі- 
ance is maximized when р = 4 = .5. 

Both the distribution of total test scores and the reliability of the 
test are partially functions of item difficulty. Other things being 


equal, the most reliable test for an entire group is the test containing 
items of 50 per cent difficulty, 1.е., items of maximum variance. Ï 


* It is customary to state item difficulty in terms of percentage succeeding. 
Thus, an item of 100 per cent difficulty for a group is one which is passed by 
all of the group; an item of 0 per cent difficulty is one which is passed by none. 

+ Maximum variance indicates that the greatest number of individual dif- 
ferences have been brought out. That items of 50 per cent difficulty do this is 
easily shown. 1f 5 individuals in a group of 10 pass an item and 5 fail it, 25 dif- 
ferences are brought out by the item, since each individual passing is different 
from each individual failing. If 6 pass and 4 fail, only 24 comparisons are per- 
mitted; if 7 pass and 3 fail, only 21; and so on. The illustration can readily 
be generalized. 
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Tests containing items of less than 50 per cent difficulty dis- 
. criminate more sharply among the individuals of better than aver- 
age ability in the group, and hence are more reliable than easier 
tests in selecting a few from a group for scholarships or “honors.” 
Such tests tend to yield distributions skewed to the right. On the 
other hand, tests containing items of greater than 50 per cent diffi- 
culty tend to yield distributions skewed to the left, and hence dis- 
criminate more reliably among the individuals of less than average 
ability in a group. It is also true that the test which discriminates 
best between two groups, one above a given level of ability and the 
other below, contains items of such difficulty that they would be 
marked correctly by half of the individuals at the given level of 
ability. (See Ref. 10.) 

The disadvantage of using proportions succeeding as measures of 
item difficulties is that, unless the ability being tested is distributed 
rectangularly, equal differences between difficulties do not represent 
equal differences in ability, i.e., the difficulty scale is not linear. 
The disadvantage can be at least partially overcome by scaling 
items (see p. 223), although in practical work this rarely is advisable. 

Item Discrimination. When the scores on a particular item 
are correlated positively with the total scores on the test, the item 
is said to be discriminating. 

There are various ways of estimating the discriminative power of 
an item. (See Ref. 8). The simplest way is that of subtracting the 
proportion of correct responses in the half of the group having the 
lowest total scores from the proportion of correct responses in the 
half having the highest. Applying this method to, say, item 6 in 
Table 7.3, we would obtain a discrimination index of 0, since the 
proportion of correct responses is 10/16 in both upper and lower 
halves of the total test score distribution. For item 23 we would 
obtain an index of 14/16 — 6/16 or .50. А similar estimate could be 
made on the basis of upper and lower thirds. All such indexes may 
vary from —1 to 1. Items passed or failed by all in a group obviously 
have no discriminative power. 

The best method of estimating discrimination is that of biserial 
correlation. As we have seen, there are two biserial coefficients, 
г» and ть, the latter being based upon the assumption that the 
dichotomized variable is normally distributed. Although ть would 
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seem the better and more logical coefficient, since item scores as 
observed are really dichotomous, the fact that r, can be approxi- 
mated quickly lends to it a practical advantage. Flanagan (Ref. 4) 
has devised а way of approximating ғу from the 27 per cent of the 
group scoring highest on the whole test and the 27 per cent scoring 
lowest. We reproduce an abridgment of a table based upon Flan- 
agan's study in our Table 7.4. 

Let us find r, for item 23 of our mathematics test by use of 
Flanagan's method. Since 27 per cent of our group is about 9, we 
determine the proportion of correct responses among the upper 9 
and lower 9 individuals in the group. The proportion among the 
upper 9 is 9/9 or 1.00; and in the lower 9, 2/9 or .22. Entering 
Table 7.4 at column 98 and going down to row 22, we read .80. 
This is an approximate value of the normalized biserial coefficient 
of correlation of item 23 with the total test scores. 

Аз noted at the foot of Table 7.4, when the proportion of correct 
responses in the lower 27 per cent exceeds that in the upper, we 
enter the table with the lower 27 per cent proportion at the top 
and attach a negative sign to the coefficient. Items showing negative 
discrimination tend to be worse than useless in a test. On the face 
of it, correct responses to such items should be scored wrong and 
incorrect responses right, but such procedure would raise several 
knotty philosophical issues. Examination of negatively discrimi- 
nating items usually reveals flaws and inconsistencies which should 
have been detected in constructing the item. Statistical analysis is 
no substitute for careful construction and editing of items. At the 
same time, careful construction of items is no substitute for sta- 
tistical analysis. The two are best thought of as complementary. 

Since it utilizes only about one-half of the information available, 
Flanagan’s coefficient is considerably less accurate than the usual 
normalized biserial coefficient. Both normalized coefficients tend 
to be substantially greater than the product-moment coefficient of 
the dichotomous data. By use of formula (6.6) we would obtain for 
item 23, where Y, = 19.4, Y. = 10.7, ра = .24, у = -38, and 
с, = 6.1, 


_ а94 — 10:624) _ 
ео” 
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TABLE 7.4* 


NORMALIZED BISERIAL COEFFICIENTS} OF CORRELATION AS 
DETERMINED FROM PROPORTIONS OF CORRECT RESPONSES IN 
UPPER AND LOWER 27 PER CENT OF THE GROU. 


О 


PROPORTION OF CORRECT RESPONSES IN THE UPPER 27 PER CENT} 


02 06 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94 98) 

5202 |00 19 30 37 43 48 51 55 58 61 63 66 68 70 72 73 75 77 79 80 82 84 86 88 91| 02 
206| 0011 19 26 31 36 40 44 47 50 53 56 59 61 64 66 68 71 73 76 78 81 84 88) 06 
5210 00 08 15 21 26 30 34 38 41 45 48 51 54 57 60 63 65 68 71 74 77 81 86| 10 
814 00 07 12 18 22 27 31 34 38 42 45 48 51 54 57 60 63 67 70 74 78 84 14 
1-18 00 06 11 16 20 25 28 32 36 39 43 47 49 53 56 60 63 67 71 76 82] 18 
"22 00 06 10 15 19 23 27 31 34 38 42 45 49 52 56 60 63 68 73 80| 22 
826 00 05 09 14 18 22 26 30 33 37 41 44 48 52 56 60 65 71 79) 26 
530 00 04 09 13 17 21 25 29 33 37 40 44 49 53 57 63 68 77| 30 
uM 00 04 09 13 17 21 25 29 33 37 41 45 49 54 60 66 75| 34 
238 00 04 08 13 16 20 25 29 33 37 42 47 51 57 64 73| 38 
242 00 04 08 12 16 20 25 29 33 38 43 48 54 61 72] 42 
246 00 04 08 12 16 21 25 30 34 39 45 БІ 59 70] 46 
250 00 04 08 13 17 21 26 31 36 42 48 56 68] 50 

54 00 04 08 13 17 22 27 32 38 45 53 66] 54 
із 00 04 09 13 18 23 28 34 41 50 63] 58 

62 00 04 09 14 19 25 31 38 47 61) 62 
E66 00 04 09 15 20 27 34 44 58| 66 
270 00 05 10 16 22 30 40 55] 70 
514 00 06 11 18 26 36 51| 74 
„18 00 06 12 21 31 48] 78 
бро 00 07 15 26 43| 82 
586 00 08 19 37| 86 
E90 00 11 30) 90 
594 00 19] 94 
298 00] 98 
^ |02 06 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94 9: 


----------------------.- 

* This table is abridged from J. С. Flanagan’s table of normalized biserial 
coefficients originally prepared for the Cooperative Test Service. It is included 
here with the generous permission of Dr. Flanagan and the Educational Test- 
ing Service of Princeton, New Jersey. 

1 Decimal points are omitted. 

t If the proportion of correct responses in the lower 27 per cent exceeds that 
in the upper, enter the table with the lower 27 per cent proportion at the top 
and attach a negative sign to the coefficient. 


and by use of formula (6.5) 


19.4 — 10.7) у” 
њ = 67 v 


The difference between coefficients tends to be of little practical 
concern, however, since a ranking of items in order of discriminative 


.64. 
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power on the basis of any of the coefficients would ordinarily corre- 
spond rather closely to the ranking on the basis of another. Thus, 
if the purpose of the analysis were to delete a given number of items 
of low discrimination, one method would tend to be about as good 
as another. In fact, as a rule, about the same items would be elimi- 
nated by simpler methods, such as that of upper and lower halves. 
For more accurate analysis or for theoretical discussion, however, 
either the usual гь or rj, should be used. These coefficients make 
use of all of the information available. Moreover, гу, in particular, 
is useful in selecting items for a test in such a way that the standard 
deviation will be as large as possible. It can be shown that the sum 
of the products of the point biserial coefficients times the standard 
deviations of the items is equal to the standard deviation of the 
total test scores. For example, if the rp,’s of the items of the mathe- 
matics test of Table 7.3 were multiplied by the respective item 
standard deviations +/pq (see p. 361), the thirty products would 
sum to 6.7, the standard deviation of the total mathematics scores. 
This relationship is useful in situations where it is desired to select 
а given number of items from a larger number for a test of maximum 
standard deviation. The items having the largest (rp) (pq) 
products are of course the ones which would be selected. It is true 
that in further samples the products and consequently the standard 
deviation will be affected by sampling fluctuations. However, if the 
original and later samples are large, the effects tend to be negligible, 
and the standard deviation will not be very different from the 
expected value. Although the above relationship holds exactly only 
for гу, ть may be similarly used іп practical work. 

It should be emphasized that гь and гу, although used for similar 
purposes and consistent in meaning, are not usually equal, the 
former tending to be substantially larger than the latter. For this 
reason, in interpreting and comparing reported biserial coefficients, 
one should always note which of the two was used. 

Other things being equal, the greater the discrimination of the 
items, the more reliable the whole test. We digress for a moment to 
consider terminology. The terms ilem reliability and item validity 
are sometimes used synonymously with item discrimination. It 
would seem preferable to limit reliability to the sense of correlation 
between repeated measures or similar measures and to restrict 
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validity to mean consistency of items with criteria outside of the 
test. 

Although our discussion to this point has been in terms of test 
items and continuous total test scores, the concept of item dis- 
crimination has wide application. We might, for example, correlate 
the “уез” or “no” responses on a questionnaire item with some 
measure of response to parts or all of the questionnaire. Whenever 
we can correlate, by one of the methods discussed in the preceding 
chapter, the responses to a single item with some larger measure 
of performance on an instrument, we can obtain an index of the 
discriminative power or consistency of the item. The deletion of 
inconsistent items will generally improve the instrument. 

Item Intercorrelation. The intercorrelations of the items on a 
test can be determined by fourfold point methods. Let us find the 
correlation between items 12 and 15 of Table 7.3. We first enter 
the item scores as tallies in a fourfold table as shown below: 


Item 12 


1 


Шет 15 
0 


19 13 32 


By formula (6.8) we obtain 


A OXN EA 
KIS RO Cae > 


and by formula (6.9) 


? 


P cos ( УХ 180°) cos 65°, 
, УХУ ХП 
г, = .42, 


Since the amount of labor in computing the intercorrelation coeffi- 
cients is ordinarily fantastic [there are n(n — 1)/2 intercorrelations 
in a test containing n items], item intercorrelation is mainly of 
theoretical interest. The idea does have some practical applications, 


Reliability and Validity of Statistical Evidence 367 


however. It has already been suggested as one way of determining 
the consistency of cross-checking items in a questionnaire (p. 260). 
It may be useful in eliminating interdependent items. When the 
intercorrelations of the items in a test are high and the items 
approximately equal in difficulty, the distribution of total test 
scores tends to be bimodal and thus to discriminate sharply at the 
middle of the scale. (See Ref. 5, p. 490.) This fact may be helpful 
in selecting items for a test designed to discriminate sharply be- 
tween individuals of moderate ability. 

Estimates of Test Reliability Based upon Item Variance. 
Kuder and Richardson (Ref. 6) have shown that an estimate of 
test reliability may be made from the variance of the total scores 
on a test and the sum of item variances. Their most generally appli- 
cable formula may be expressed 

Тыны 
n= LI (229) (7.16) 
in which п is the number of items in the test, с? is the variance of 
the total scores, and Хрд is the sum of the products of the pro- 
portions passing and failing each item, i.e., the sum of the item 
variances. 

Let us use the Kuder-Richardson formula to estimate the reli- 
ability coefficient of the mathematics test whose item statistics 
are shown in Table 7.3. The sum of the pq products shown in the 
last row of the table is 5.6. The value of c? is 44.4, and п = 30. 
Substituting in the formula we have 


30 (44.4 — 5.6 
п = 59 (44-4) = .90, 


If the items in a test are іп fact equal in difficulty, formula (7.16) 


simplifies to 
500 с? — пра 
Dus т ("м ) (7.17) 


in which p is equal to X,/n and 4 equals | — p. If we apply formula 
(7.17) to the mathematics test data of Table 7.3, the values for sub- 
stitution are 


n = 30, с? = 44.4, p= n = .54, 4 = .46, 
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so that 


_ 30 (44.4 — ч я 
50 444 TAG 


Formula (7.17) requires much less information than (7.16). Em- 
pirical study indicates that it gives fairly good results, even when 
item difficulties vary considerably, as in the example. 

The Kuder-Richardson methods, like the half-test method, do 
not provide estimates of reliability in the sense of agreement be- 
tween repeated measurements. Since they utilize scores obtained 
from a single administration of a test, it would appear likely that 
to some extent they overestimate reliability. They do not take into 
account ordinary day-to-day variation in individuals and are sus- 
ceptible to correlation between errors of measurement. Like the 
half-test method, they are inappropriate for speeded tests. 

Item Validity. А test item may be said to be formally valid when 
it is consistent with content which has been taught, specifications 
drawn up in advance, the opinions of experts, and so forth. It is 
experimentally valid when it correlates with a criterion variable 
outside of the test. 

Any of the methods of estimating the discriminative power of an 
item, mentioned earlier, may be used in estimating item validity. 
For example, to determine the experimental validity of a mathe- 
matics test item of Table 7.3 as a predictor of achievement in sta- 
tistics, as measured, we must find out whether the item scores are 
correlated with the statistics achievement, scores. Let us consider 
item 12. The achievement scores for those passing and failing the 
item are shown below. 


1—92,100,105,94,88,84,93,98,80,82,81,80,86 


оза 0—93,90,94,72,82,81,79,85,78,78,87,84,93,78,74,75,84,62,69 


The mean achievement score of those passing the item is 89.5; 
that of those failing the item 80.9. The respective proportions of 
passes and failures are .41 and .59, and the standard deviation of the 
achievement scores is 9,2. When we substitute these values in 
formula (6.5) we obtain 


_ (89.5 — 80.9) AL X 59 
9.2 


Tob 


.46. 
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We might, ofcourse, һауе used гь, or Flanagan’s short-cut, or the 
method of upper-lower halves in our analysis. 

The validity coefficients of the majority of the items in Table 
7.3 are no better than chance magnitude. As a rule the validity 
coefficients of the items in a predictor test are low and, when the 
sample is small, unreliable. Other things being equal, the higher the 
validity coefficients of the items the greater the validity or predictive 
value of the test. The validity of tests usually can be improved to 
some extent through item analysis; however, it is common experi- 
ence that items showing substantial validity are hard to find. 

It is often the case that the criterion variable can best be observed 
in two categories. In validating the items used in a personnel selec- 
tion test, for example, the criterion variable, success on the job, 
may be available in two categories, "succeeded" or * failed." 
When this is the case, the fourfold point methods of correlation are 
called for. If the criterion variable is observed in more than two 
categories, contingency correlation can be used. 

Concluding Remarks. Item statistics are relatively much 
affected by sampling fluctuations; and, as a rule, those derived from 
a small sample can be applied with little confidence to other sam- 
ples. The relation of item statistics to whole-test statistics is complex. 
The content, difficulty, discriminative power, апа intercorrelations 
of items interact to establish the reliability and validity of a test, 
but the total relationship is too complex to make use of in practical 
work, 

Item analysis is not, however, of theoretical interest only. When 
samples are of fair size, say about 60 or more, item analysis yields 
reliable information indispensable in test improvement. In quite 
small samples, item analysis ordinarily results in deleting or chang- 
ing some items and a somewhat better test for future use. The rela- 
tion of a particular item statistic, such as a measure of difficulty 
or an index of discrimination, to whole-test statistics may be useful 
in developing tests for special purposes. Several such possibilities 
were brought out in our discussion. Furthermore, item analysis will 
usually result in improved skill in item construction. It is always 
instructive to examine the content, wording, and position of the 
various items in a test after their statistics are known. 

Nothing has been said about “ distractor” or incorrect-response 
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analysis or about the correction of test scores for chance success 
on the items. (See exr. 38.) The student will find these topics well 
treated in Ref. 2. 


31. 


M. 


35 


36 


37. 
38. 


Exercises 


. Apply the upper-lower halves method, гр, гь, and Flanagan's method 


of estimating item discrimination to one or more of the items of Table 
7.3 and compare results. 


. By the same methods estimate and compare validity indices of several 


of the items as predictors of achievement in statistics. 
Find the intercorrelation of two of the items of Table 7.3. 


. Ina preliminary investigation of the study habits of college freshmen, 


a researcher gave an inventory comprising 60 items to about 200 fresh- 
men. How would he check the reliability and validity of the items, if 
his purpose is to find out whether there is relationship between study 
habits and grades? 

A teacher sets out to construct an objective semester achievement test 
of about 100 items. He plans to construct and try out items over a 
period of several semesters. What item analyses should he make? 

A personnel official desires to construct an objective aptitude test of 
about 50 items to be used in the selection of employees. What pro- 
cedures would you recommend? 

Show that, if 50 in a group of 100 pass an item and 50 fail, more indi- 
vidual differences are brought out than by any other numbers passing 
and failing. 

Sketch frequency polygons showing marked negative skewness, bi- 
modality, and positive skewness. Which one magnifies differences 
between individuals of high ability, which differences between indi- 
viduals of low ability, and which differences between individuals of 
moderate ability? 

Summarize what seem to you to be the major benefits of item analysis. 
The two most widely used formulas for the correction of test scores 


for chance success on items are, 5 = В — = w тав4 S' = А – 0, іп 


which 5 is the total score, R is the number of correct responses, W the 
number of incorrect responses, O is the number of items omitted, and 
n is the number of options on each item. Show by numerical illustration 
or prove in general that the two sets of total scores S and S’ are per- 
fectly correlated. 
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Chapter VIII 


Statistical Inference 


STATISTICAL DATA are subject to constant errors due to bias 
and to the variable errors of measurement and of sampling. In the 
preceding chapter we considered errors of bias and of measurement 
at some length. In this chapter we shall deal with sampling errors. 

The theories of errors of measurement and of sampling are not 
unrelated. When au obtained score is taken as an estimate of the 
corresponding but unknown true score an error of measurement 
attaches to the obtained score. Further scores obtained under 
comparable conditions are usually not in exact agreement with the 
first. Analogous to this, when a sample statistic, such as a mean, 
is taken аз an estimate of the corresponding but unknown true 
statistic of the population, a sampling error attaches to the sample 
statistic. The statistic in further samples can be expected to show 
variation, to some extent. Since the scores or measures in a sample 
are rarely if ever perfectly reliable, a sample statistic usually contains 
both sampling and measurement errors. It is extremely difficult, 
however, to treat both types of variable errors in the same discus- 
sion. For this reason, we shall first consider sampling errors, assuming 
that the measures in the sample are perfectly reliable. Later we shall 
elaborate certain of the effects of errors of measurement upon sample 
Statistics. 

In the preceding pages we have been concerned primarily with the 
calculation and interpretation of descriptive statistics, such as the 
mean, standard deviation, and coefficient of correlation. In severel 
of the discussions, however, important concepts in sampling theory 
were anticipated and briefly previewed. As a background for the 
present chapter, the student may find it helpful to reread at this 
time pp. 6-14, 65, 98-100, 171,-173, 328-329. 
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Sampling Theory and Statistical Inference 


The concepts underlying sampling theory previewed in earlier 
chapters may conveniently be summarized in seven paragraphs: 


a. A sampling problem exists whenever the conclusions derived 
from observation of a limited number of individuals are applied 
to a larger (usually much larger) number of individuals. The 
former number constitutes the sample; the latter the population. 
The generalized conclusions commonly are called statistical 
inferences. Sampling theory and procedures are concerned with 
the conditions under which sound inferences about the charac- 
teristics of a population can be drawn from a sample. 

b. The most important of these conditions is that of a random sam- 
ple of individuals from a clearly specified population. Unless this 
condition is met, sample evidence has no demonstrable generality. 
Only a random sample can be considered to be an unbiased and 
therefore representative sample, and only a random sample 
permits inferences of a determinable degree of certainty. 

с. When the method of sampling assures every individual in the 
population the same chance of being drawn as any other indi- 
vidual, the sample is considered to be random. This sort of sam- 
pling is called simple random sampling. 

d. A random sample provides an approximate replica of the popula- 
tion, the goodness of the approximation depending upon the size 
of the sample. As the size of the sample is increased, the form and 
the statistics of the sample distribution approach those of the 
population. When the size of the sample is large—say 100 at 
least—the frequency polygon is usually smooth enough to give 
a good idea concerning the form of the population distribution. 
The distribution of even a quite large sample, however, can be 
expected to diverge from population form to some extent owing 
to the fluctuations of sampling. 

e. Like the sample distribution, sample statistics are subject to 
chance fluctuations. A sample statistic does not ordinarily agree 
with that derived from a second sample or with the population 
parameter.* The amount of fluctuation depends upon the vari- 
ability of the population and the size of the sample. 


* A statistical measure, such as a mean, standard deviation, or correlation 
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f. Certain statistics tend to fluctuate less than others in successive 
samples. In sampling from a normal population, the mean 
fluctuates less than other measures of central tendency and 
therefore provides the most reliable estimate of the central tend- 
ency of a normal population. Similarly, the standard deviation 
is the most reliable estimate of the variability of a normal popu- 
lation. In other words, the mean and standard deviation are 
more reliable than other similar measures, because they tend to 
approximate more closely the corresponding population -param- 
eters and hence to fluctuate less from sample to sample. It is also 
true that the product-moment coefficient of correlation is the 
most reliable estimate of linear relationship in a normal bivariate 
population. 

g. An inference based upon sample evidence rests upon three 
points: (1) the individuals in the sample are representative of 
a population of individuals; (2) certain facts are observed to 
characterize the individuals of the sample; (3) therefore, prob- 
ably and approximately, the observed facts characterize the 
individuals of the population. 


The Role of Statistical Inference. When exact meanings are 
given to the words probably and approximately, the nature of sta- 
tistical inference is well delineated in paragraph (g), above. These 
meanings will emerge a little later in our discussions of probability, 
sampling distributions, hypotheses, and estimation. Preliminary to 
these more technical topics, let us consider informally the role of 
inference in research. 

The possibility of making inferences about a population from the 
information provided by a sample is fundamental in research work. 
It is an exciting possibility. In Walker's words (Ref. 24, p. 229): 


The idea that information obtained from a relatively small number 
of cases actually examined can be used to throw light on the character- 
istics of a vast universe which has not been examined is an exciting 
idea, which ceases to amaze us only when familiarity renders it common- 


coefficient, calculated from a sample is called a statistic; the corresponding, 
usually unknown, measure in the population is called a parameter. In the sam- 
pling situation, knowledge concerning parameters is inferred from sample 
statistics. 
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place. That the sample not only furnishes an estimate of some character- 
istic of the unknown population, but also furnishes a measure of the 
amount of confidence which can be placed in the estimate, is still more 
remarkable. 


There are two broad types of questions regarding populations 
which we attempt to answer by inferences drawn from the sample. 
First, we may ask whether some hypothesis or belief about the 


population is consistent with the evidence provided by the sample. ‘ 


For example, we find that, say, 60 per cent of the voters їп а random 
sample from a specified population are in favor of some proposal; 
and we ask whether the belief that opinion in the population is 
actually evenly divided on the proposal is consistent with this sam- 
ple finding. This amounts to asking whether, in sampling from a 
population in which opinion is evenly divided, it is reasonable to 
suppose that a 60%:40% sample would arise through sampling 
fluctuations. As another example, we may ask how reasonable is 
the belief that the mean number of hours in the working week of 
Pennsylvania high school teachers is as low as 40 hours, if the mean 
in a sample of 250 teachers is found to be 44 hours. 

Аз a second kind of question, we may ask within what limits 
must the value of a population parameter lie to be reasonably 
consistent with the value of a corresponding sample statistic. In 


the first example above, this amounts to asking within what limits.. 


must the population percentage of favorable opinion lie in order to 
make a sample containing 60 per cent favorable a reasonable (not 
improbable) occurrence. In the second example, we may ask within 
what limits must the population mean lie in order to make the 
sample mean 44 hours a not improbable value. 

It will be seen, as our discussion of sampling and inference pro- 
ceeds, that the two types of questions are closely related. This will 
be brought out in connection with testing hypotheses and statistical 
estimation. But first we need to take up the two bases of inference: 
probability and sampling distribution. 

The Meaning of Probability. The inferences that are drawn 
from a sample are always accompanied by a statement indicating 
the amount or degree of uncertainty in the inferences. These state- 
ments typically are made in terms of probability. 

'The philosopher Bishop Berkeley is supposed to have said, 


l 
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“Probability is the guide to life.” While this may be debated, there 
can be no debate concerning the central position that probability 
occupies in statistical inference. The word probable, although usu- 
ally somewhat ambiguous in ordinary discourse, is rather uni- 
versally used to qualify a proposition which is regarded as less than 
certain, but for which there exists, or is believed to exist, some 
evidence. Such statements as “It appears probable that there is a 
real difference between these groups” and “There probably is some 
relationship between frustration and neurosis” have in common the 
idea that something is more likely to be true than false. То be useful 
in statistics, however, the terms probable and probability must be 
given exact, quantitative meaning. 

Although any definition of probability raises stubborn philo- 
sophical questions (see Ref. 20, p. 242), in sampling theory it is 
rather generally agreed to treat probability as equivalent to relative 
frequency. Thus, to say that the probability of a head on a single 
toss of a coin is 1/2 is to imply that if the coin were tossed over 
and over again the relative frequency of heads would approach 
1/2 or .5. If each time the coin were tossed one would guess “heads,” 
he would be right one-half of the time over the long run. To say 
that .51 is the probability that a child to be born will be a boy is 
to imply that in the past the relative frequency of male births has 
been found to be .51. To state that .16 is the probability of drawing 
at random from a normal population of measures a single measure 
which deviates from the mean of the population by +10 or more 
is to imply that in repeated sampling from the population the rela- 
tive frequency of such deviates would approach .16. 

In general, we may say that the probability of an event is the 
observed relative frequency or theoretical relative frequency of the 
event over the long run. It is possible to define probability more 
rigorously, but, however defined, probability as used in statistical 
inference is equivalent to relative frequency. 

We shall have occasion to use two elementary laws or rules of 
probability. The first is the addition rule, which may be stated: 


If P(A) is the probability of event A, and P(B) the probability of 
event B, the probability that either A or B will occur is the sum of P(A) 
and P(B), provided A and B are mutually exclusive. 
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Let us illustrate the rule by reference to drawing at random a single 
measure from a normal distribution. The probability that the 
measure will deviate from the mean of the distribution by as much 
аз 4-10 is .16. The probability that it will deviate from the mean 
by as much as — 1с also is .16. Hence, by the addition rule, the 
probability that it will deviate from the mean by as much аз +l 
or —lø is .16 + .16 or .32. 

The addition rule may readily be extended to more than two 
mutually exclusive events. For example, the probability of drawing 
either an ace, or a king, or a queen, or a jack at one draw from a 
deck of ordinary playing cards is 1/13 + 1/13 + 1/13 + 1/13 or 
4/13, since the probability of each of the events is 1/13. Over the 
long run, we would expect to observe an ace, king, queen, or jack 
at one draw 4/13 or about .3 of the time. 

The second rule is the multiplication rule. This may be stated: 


If P(A) is the probability of event A, and P(B) the probability of 
event B, the probability that A and B will occur simultaneously or in 
succession is the product of P(A) and P(B), provided the eyents are 
independent. 


Like the addition rule, the multiplication rule can be extended to 
more than two events. By the rule, the probability of obtaining 
heads on each of two flips of a coin is 1/2 Х 1/2 or .25; the proba- 
bility of heads on each of three flips is 1/2 Х 1/2 X 1/2 or 125, 
апа во оп, 

The multiplication rule сап be applied to dependent or related 
events, provided the conditional probabilities can be identified. 
For example, the probability of an ace-king sequence in two draws 
from a deck of cards is 4/52 X 4/51. This is so because the proba- 
bility of a king after an ace is drawn is 4/51, four of the remaining 
51 cards being kings. However, the probability of an ace-king 
sequence, if the ace is replaced and the cards reshuffled after the 
first draw, is 4/52 X 4/52, the replacement and reshuffle making 
the events independent in the probability sense. 

The addition and multiplication rules frequently are applied 
together. To illustrate, let us determine the probability that in 
tossing three coins once (or one coin three times) we will obtain 
two heads and one tail. There are three ways in which this com- 
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bination may come about: HH T, first and second coins showing 
heads, the third tails; HTH, the first and third heads, the second 
tails; and THH, the first tails, the second and third heads. By the 
multiplication rule, the probability of each permutation or sequence 
is 1/2 X 1/2 X 1/2 or 1/8. Applying the addition rule, we obtain 
1/8 + 1/8 + 1/8 or 3/8 as the probability of the combination of 
two heads and one tail. Over the long run, in tossing three coins, 
we would expect to observe two heads and one tail 3/8 of the time. 

Probability figures may be expressed as fractions or as decimal 
numbers, the latter being the more common. They are essentially 
the theoretical relative frequencies of chance events. The way in 
which a probability figure is used to indicate quantitatively the 
degree of uncertainty in a statistical inference will be elaborated in 
later sections. 

The Sampling Distribution. The whole theory of sampling 
and statistical inference is based upon probability and sampling 
distributions. In most sampling situations, there are three dis- 
tributions of concern, the sample distribution, the population dis- 
Iribulion, and the sampling distribution. A simple illustration will 
help us to distinguish among the three. 

Suppose that we wish to know the mean number of hours in the 
working week of some 25,000 high school teachers in Pennsylvania 
and that owing to time or money considerations we can examine 
only a random sample of 250 teachers. The sample distribution 
would be merely the frequency distribution of the hours in the 
working week of the sample of 250. It would be an observed dis- 
tribution and one which would ordinarily be described by certain 
of the summarizing statistics we discussed in Chapters Ш and IV. 
Ordinarily, the sample distribution is the only concrete distribution 
in the sampling situation. 

Now if we had a record of the hours in the working week of all 
of the 25,000 high school teachers in Pennsylvania we could tabu- 
late those hours in a frequency distribution. This would be the 
population distribution. Neither the form nor the parameters of the 
population distribution ordinarily are known, but we can draw 
inferences about them from the sample distribution. In our illus- 
tration, if the sample distribution approximated normality, we 
might infer that the population distribution is normal, ascribing 


Statistical Inference 379 


any irregularities. of the sample distribution to sampling fluctu- 
ations. If the mean of the sample distribution were, say, 44. hours 
and the standard deviation 5 hours, we might infer that the mean 
and standard deviation of the population are probably and approxi- 
mately equal to 44 and 5 hours, respectively. 

Now such inferences are susceptible to sampling errors. We do 
not have in our sample distribution absolute form and fixed mean 
and standard deviation which would be observed again in a second 
sample of 250 from the population. Indeed, if we were to take, say, 
100 samples of 250 each, we would have 100 different sample dis- 
tributions, 100 varying means, and 100 varying standard devi- 
ations. If we grouped the 100 means in a frequency distribution 
we would have an experimental sampling distribution of the mean 
in samples of size 250 from our population; if we grouped the 100 
standard deviations we would have an experimental sampling dis- 
tribution of the standard deviation. If we were to continue sampling 
until we had drawn all possible samples of size 250 from our popu- 
lation of about 25,000, we would arrive at exact sampling dis- 
tributions. This we cannot do, since the number of possible samples 
would be the astronomical number of combinations of some 25,000 
cases taken 250 at a time. 

Except as a check and clarification of theory, experimental 
sampling distributions are of little use. Fortunately, it is possible 
to determine by analytic methods the distribution of a sample 
statistic, such as a mean or standard deviation, in all possible 
samples of a given size N, provided the population distribution can 
be assumed to be normal. This distribution is known as the sampling 
dislribulion of the given statistic. It is a theoretical or idealized 
distribution, but may be thought of as the distribution which would 
result if the values of a given statistic were computed in all possible 
samples of a given size actually drawn from the specified population. 

Statisticians have been able to determine the sampling dis- 
tribution of the majority of commonly used statistics in samples 
drawn from a normal population. When the samples are large, many 
of the statistics are distributed normally, or nearly so. In small 
samples, certain statistics are distributed symmetrically in what 
is known as the / distribution. Other sampling distributions of 
importance in educational research include the binomial, the x* 
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(chi square), and the F distributions. In later sections we shall deal 
with the nature and use of each of these five common sampling 
distributions. 

When the sampling distribution of a statistic is known, it is pos- 
sible to determine the relative frequency with which different 
sample values are expected to occur in random sampling from a 
population in which the statistic has some assumed or hypothesized 
yalue. Knowing the relative frequency, we at once know the proba- 
bility that a particular sample value has resulted from sampling 
fluctuations, i.e., chance. It is this probability figure which enables 
us to judge the soundness of an hypothesis regarding the value of 
the statistic in the population. 

Тһе Statistical Hypothesis and Its Test. Ina broad sense, the 
term hypothesis refers to a tentative statement or proposition which 
may explain observed facts. The crucial step in research is that of 
testing the hypothesis, and this test is always the common sense 
one of determining whether the hypothesis is consistent with the 
facts. Facts and hypothesis are in reciprocal relationship: the facts 
suggest and support the hypothesis; the hypothesis explains or 
accounts for the facts. 

The necessity of hypothesis in statistical inference stems out of 
our lack of knowledge about the population. Ordinarily the only 
information we have about population form and parameters is 
that provided by the sample. Because of sampling errors, such 
information cannot be accepted at its face value. It can be used, 
however, to test the reasonableness of the hypotheses we make 
about population form and parameters. 

The most useful and successful method yet devised of testing an 
hypothesis is based upon the assumption that the hypothesis is 
true. The hypothesis is developed by "if-then" argument, When 
the ** thens" or expectations which logically follow if the hypothesis 
is true are consistent with observable facts, the hypothesis is con- 
sidered tenable; if not, it is rejected. 

In statistics, an hypothesis which is tested for possible rejection 
under the assumption that it is true commonly is called a null 
hypothesis. Essentially the null hypothesis assumes a particular 
value of a population parameter, and the hypothesis is tested by 
determining whether the sample in hand could reasonably have 
arisen in sampling from a population actually having this assumed 
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parameter value. If so, the hypothesis is tenable. This does not 
mean that the hypothesis is proved, but only that it is acceptable, 
perhaps one of several acceptable hypotheses. 

In illustration of this somewhat roundabout procedure, suppose 
that we wish to know whether it is reasonable to believe that the 
mean ТО of workers in a certain occupation in a given city is, say, 
100, and suppose that in a random sample of № of these workers we 
find a mean IQ of 110. The hypothesis we wish to test is that the 
population mean is 100. Assuming the hypothesis to be true, the 
sampling distribution of the mean in samples of size N has a mean 
(mean of means) of 100, and the expected value of a single sample 
mean is therefore 100. If the difference between the observed mean 
110 and the expected mean 100 can be reasonably attributed to 
sampling fluctuations, the hypothesis that the population mean is 
100 is tenable; if not, the hypothesis is discredited. Various other 
hypotheses about the population mean might of course be tested. 

In a perfectly general sense, the testing of an hypothesis consists 
in determining whether some discrepancy or difference between 
expectation and observation can reasonably be ascribed to chance. 
This determination typically is expressed in terms of probability— 
the smaller the probability, the less reasonable it is to suppose that 
the difference is due to chance. Stated another way, the smaller the 
probability, the more reasonable it is to conclude that there is a 
real or nonchance difference. It follows that to accept the null hy- 
pothesis is to conclude that the observed difference is due to chance; lo 
reject the null hypothesis is lo conclude thal the difference is nonchance 
or real. 

The important question of how small a probability must be 
before a real difference is demonstrated does not permit a general 
or final answer. An hypothesis does not suddenly or clearly become 
untenable at some probability figure. The best that can be done is 
to judge the unsoundness or falsity of an hypothesis in light of | 
levels of significance. When an observed difference would arise 
owing to the chance fluctuations of sampling not more than five 
times in 100 or 5 per cent of the time, the hypothesis is discredited 
at the 5 per cent level of significance; when it would arise owing to 
chance not more than .01 or 1 per cent of the time, the hypothesis 
is discredited at the 1 per cent level of significance; and so on; the 
lower the level of significance, the less tenable the hypothesis. 
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It is becoming the fairly common practice to reject an hypothesis 
when p < .05* and to accept it when p > .05, but it is open to 
the investigator to adopt a more or less exacting level of significance 
before he concludes that the observed difference is due or not due 
to chance. Moreover, one level of significance may be appropriate 
in a particular sampling situation, whereas a more exacting level 
may be judged desirable in another. This matter will be taken up 
below in connection with mistakes in testing hypotheses. 

The limits of probability figures are 0 and 1. The nearer a proba- 
bility figure P approaches the lower limit 0, the less tenable the null 
hypothesis. When P falls in the .05-.95 interval, say, the hypothesis 
ordinarily is considered acceptable. When P falls above .95, the 
assumptions underlying the sampling distribution or the sampling 
procedures are questionable, and the nearer P approaches its upper 
limit 1, the more questionable they become. Agreement between 
observation and expectation so close as to yield a P of .95 or more 
arises 5 per cent or less of the time in random sampling. Such close 
agreement is open to suspicion as “too good to be true.” 

Tests of statistical hypotheses commonly are called tests of signifi- 
cance. In summary statement, the ригрозе of a test of significance 
is always to determine the probability that an observed difference 
between the sample value of some statistic and the value proposed 
by an hypothesis could result from the fluctuations of random 
sampling. 

Mistakes in Testing Hypotheses. An hypothesis about a popu- 
lation obviously is either true or false, but unless we examine the 
entire population we cannot be certain which it is. When only a 

* It is interesting to note that the 5 per cent level of significance roughly 
checks with our intuitive reaction to the working of chance in the following 
situation. Suppose someone hands us a coin, asking us to flip it several times, 
without examining the coin. Suppose now that on the first three flips we observe 
heads. The probability of this sequence, if the coin is unbiased, is 1/2 X 1/2 
X 1/2 or about .12. Although at this point we would tend to have some doubt 
about the fairness of the coin, few of us would be ready to conclude that the 
coin is unfair. Now suppose that we observe heads on the fourth and fifth flips. 
The probability of the 4-head sequence is 1/16 or about .06, that of the 5-head 
sequence 1/32 or about .03. Most of us would begin to have quite serious doubt 
about the fairness of the coin after the 4-head sequence, and our doubt. would 
sharply increase with the 5-head sequence. In other words, in this situation, we 


intuitively tend to reject the notion that chance is a reasonable explanation of 
happenings characterized by probability figures in the .03—.06 neighborhood. 
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sample of evidence is available, as is generally the case, there is 
always the possibility of rejecting an hypothesis which is in fact 
true or of accepting an hypothesis which is in fact false. These 
possibilities are always present, because sample evidence permits 
only probability statements against or for hypotheses. Samples 
having very small probabilities, if the hypothesis is true, do occa- 
sionally occur; on the other hand, samples having large proba- 
bilities do occasionally occur even though the hypothesis is false. 
To illustrate, suppose that we hypothesize that the mean of a 
population is 40, Even though this hypothesis may in fact be true, 
a few sample means, of the many possible, would show an extreme 
and therefore highly improbable deviation from the expected value 
40. On the other hand, a few sample means may be 40 or there- 
abouts and therefore highly probable even though the hypothesis 
may in fact be false. Since there is no way of being sure that the 
sample in hand is not one of these relatively rare samples, we can 
never be sure that an inference is not mistaken. 

The test of a particular hypothesis about a population obviously 
will terminate in one of four results: (1) a true hypothesis will be 
accepted, (2) a false hypothesis will be rejected, (3) a true hypothesis 
will be rejected, or (4) a false hypothesis will be accepted. There is 
no mistake in the first or second result, and it is the aim of sta- 
tistical test to achieve one or the other, 1.е., to accept hypotheses 
which are in fact true and to reject hypotheses which are in fact 
false. Stated negatively, it is the aim of statistical test to avoid 
rejecting a true hypothesis (commonly called a Type I mistake 
or error) and to avoid accepting a false hypothesis (Type П mistake 
or error). 

A great deal of statistical theory is concerned with problems of 
reducing and controlling the dangers of these two types of errors. 
The problems are not simple, because, for fixed sample size, to 
reduce the risk of the first type is to increase the risk of the second. 
By choosing a .01 probability figure (1 per cent level of significance) 
instead of a .05, one can reduce the risk of rejecting a true hypothe- 
sis, and can further reduce it by choosing a .001 or 0.1 per cent 
level. Over the long run, using the 5 per cent level, one will 
reject not more than 5 true hypotheses in 100; using the 1 per 
cent level, not more than 1 in 100; using the 0.1 per cent level, 
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not more than 1 in 1,000. Clearly, one can make the risk of a Type I 
error as small as one pleases. However, since hypotheses not rejected 
are considered acceptable, any reduction in the risk of rejecting a 
true hypothesis is inevitably accompanied by an increase in the 
risk of accepting a false one. 

This dilemma can be portrayed graphically. Consider a situation 
in which it is known that the sampling distribution of a statistic, 
say the mean, is normal in form and suppose the hypothesis that 
the mean М* of the population has a particular value is being 
tested. In this situation, if the hypothesis is true, the mean of the 
sampling distribution is equal to 
M, as shown in Fig. 8.1. Now if 
the mean of arandom sample of 
size N differs from the expected 
value М sufficiently to fall in one 
or the other of the tail portions 
marked off in the figure, the hy- 
pothesis can be rejected at the 5 
per cent level of significance, since 
-1.96 f 4196 the relative frequency or proba- 
bility of samples of size N having 
means which differ positively or 
negatively from М by as much as 
the given sample is .05. The two 
tail portions combined constitute what is called the crilical region or 
the region of rejection. On the other hand, if the mean of the sample 
has a value sufficiently close to М to fall in the region between the 
two tail portions or the region of acceplance, the hypothesis is accept- 
able, in the sense that it cannot be rejected at the level adopted. 

It will be seen that the proportion of area in the region of rejection 
corresponds to the probability of rejecting an hypothesis which is in 


Region of 
acceptance 


Fig. 8.1. А two-sided .05 region 
of rejection in a normal sampling 
distribution. 


* We shall use either symbol and caret or symbol and subscript to indicate 
population parameters. Thus, M or Xp will always refer to the mean of the 
population. When it is desirable to distinguish between the actual parameter 
and the hypothesized parameter, we shall indicate the latter by the subscript 
H. Thus, Ми would refer to the hypothesized mean of the population. Since 
the null hypothesis assumes that the parameter in question is equal to its 
hypothesized value, it is usually not necessary to distinguish between actual 
and hypothesized parameter. 
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fact true or the probability of a Type Terror. If this region is made 
smaller, the risk of rejecting a true hypothesis is decreased. If this 
is done, however, the region of acceptance necessarily becomes 
larger. When the region of acceptance is increased in size, a greater 
discrepancy between sample value and expected value is tolerated 
before the hypothesis is declared false. As a consequence, the risk 
of accepting a false hypothesis, the Type II error, is increased. 

Controlling the Risk of Error in Testing Hypotheses. In the 
attempt to control the risk of the Type I error, the usual procedure 
is to limit the risk of rejecting a true hypothesis to some preassigned 
amount, e.g., .05, .01, or 1001. The selection of the probability 
figure depends largely upon the nature of the problem. If the conse- 
quences of rejecting an hypothesis which is in fact true are of serious 
concern, a low probability figure, perhaps .01 or .005 or еуеп a 
smaller one, would be selected. If, however, the consequences of 
rejecting a true hypothesis are relatively unimportant, one may 
wish to reject an hypothesis if there is even slight evidence against 
it. In this case a high probability figure, perhaps one as high as 
.10, may be desirable. 

Consider a situation, for example, in which a new method of 
teaching physics is being tried out in samples of students which 
can be considered representative of the population of future students 
who will study physics. If the new method would require radical 
or expensive changes in class size, school routine, or equipment, the 
school officials likely would hesitate to reject the hypothesis that 
the new method is no better than the old unless the evidence against 
it were strong. But if the new method were as practicable as the 
old and no more expensive, the hypothesis likely would be rejected 
if there were even relatively slight evidence against it. 

In general, the more serious the consequences of rejecting a true 
hypothesis, the lower the probability figure or the smaller the 
region of rejection one would select. 

After a probability figure has been selected to limit the risk of 
rejecting a true hypothesis to some desired amount, the risk of 
accepting a false hypothesis (Type ТІ error) must be considered.* 


,* In technical language the probability that a statistical test will, under 
given conditions, lead to the rejection of a false hypothesis is called the power 
of the test. See Ref. 3. 
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Ав we have seen, the risk of the latter is inevitably increased when 
the risk of the former is decreased. The risk of accepting a false 
hypothesis is further affected by the relation of the actual value 
of the population parameter to its hypothesized value. If the 
hypothesis is in fact false, the actual value of the parameter obvi- 
ously must be either less than or greater than the value stated by 
the hypothesis. The way in which these two alternatives influence 
procedures in testing hypotheses is best seen in reference to possible 
regions of rejection and acceptance in a sampling distribution 
used in testing hypotheses about, say, a population mean. One 
arrangement of such regions is shown in Fig. 8.1. It is shown again 
with two other possible arrangements in Fig. 8.2. (Smaller or larger 
regions of rejection could of course be chosen.) The risk of rejecting 
an hypothesis concerning the value of М which is in fact true is .05 
for each of the three arrangements, but the risk of accepting a false 
hypothesis is partly determined by the relation of the actual value 
of М to its hypothesized value. If the hypothesis is false because М 
is actually less than its hypothesized value (see Fig. 8.3), in succes- 
sive sampling a greater number of sample means will fall in the 
(a) region of rejection than in the (b) or (c) region. Hence, over the 
long run the false hypothesis will be rejected a greater number of 
times and accepted fewer times if the (a) region is selected. On the 
other hand if the hypothesis is false because М is actually greater 
than its hypothesized value, in successive sampling a greater number 
of sample means will fall in the (c) region of rejection than in the 
(a) or (b) region. Hence, in case M is actually greater than its 
hypothesized value, the (с) region is best, in the sense that it re- 
duces the risk of accepting the false hypothesis to a minimum. 
Neither the (a) nor the (b) region, however, is a good safeguard 
against the opposite alternative. The (b) region effects a compromise 
and is the one commonly used, if neither alternative is of more 
serious concern than the other, as is usually the case in educational 
research. 

In the real sampling situation, of course, we do not know the 
relation of a parameter to its hypothesized value, and our selection 
of a region of rejection is governed by the nature of the hypothesis 
to be tested. From the theoretical considerations of the preceding 
paragraph, it follows that the (a) region of Fig. 8.2 is the appropriate 
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one to employ in testing the hypothesis that the parameter in 
question, say the mean М, is nol less than a specified. or proposed 
yalue, say G. This hypothesis would be expressed, Н :M >G. If 
the hypothesis is false it must be because М is actually less than G. 
Should this be the case, the (a) region minimizes the danger of 
accepting it. 


Region of 
acceptance 


Region of 
acceptance 


Region of 
acceptance 


0.05 
—1. М -196 М +196 fi 5.4 
(а) (b) (с) 


Fig. 8.2. Three arrangements of regions of rejection ina 
normal sampling distribution which limit the risk of 
rejecting a true hypothesis to .05. 


| IP T ST 
| М Мн М Мн 
(а) (Ь) (с) 
Fig. 8.3. Assumed (—) and true (---) sampling distributions in 


case M is in fact less than its hypothesized value Ми. In this case the 
probability of a sample mean falling in a region of rejection is greatest 
for (a). 

The same sort of reasoning leads to the conclusion that the (с) 
region is the appropriate one to employ in testing the hypothesis 
that the parameter in question is not grealer than a proposed value, 
or, in the symbols used above, Н: М < 6. 

The (5) region is the appropriate one to employ in testing the 
| exact hypothesis that the parameter in question is equal to a pro- 
posed value. This hypothesis is false if the parameter is either 
| greater than or less than the proposed value, and the (b) region 
| covers both alternatives equally well. In this sense it is unbiased, 
but obviously it is not the one to employ if either alternative is of 
more concern than the other. 
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Let us note again that, in all three arrangements of Fig. 8.2, the 
risk of rejecting a true hypothesis (Type I error) is the same. This 
risk is controlled by the size of the region of rejection, not its loca- 
tion. The nature of the problem ordinarily suggests both the ap- 
propriate size and the appropriate location of the region of rejection. 
In the sections following, these general ideas will be applied to real 
sampling problems. 

Both the size of the sample and the statistic employed in testing 
hypotheses about a population have bearing upon the risk of mis- 
taken inferences. We shall later see that the variability or spread 
of a sampling distribution is decreased if the size of the sample is 
increased. In other words, the larger the sample, the less a sample 
statistic varies in successive sampling. It follows that, as sample 
size is increased, greater precision in testing hypotheses is possible, 
with less danger of mistaken inferences. 

Different statistics may be employed in testing hypotheses. For 
example, the median as well as the mean may be used in testing an 
hypothesis regarding the point of central tendency of a normal 
population. However, the sampling distribution of the median is 
characterized by greater variation or spread than that of the mean. 
The use of the mean in preference to the median is, therefore, 
equivalent to employing a larger sample. A statistic which shows 
less variation in successive samples than the other statistics of its 
class is said to be efficient. 

Estimation of Population Parameters. It is sometimes the 
case that neither past experience nor the nature of the problem 
suggests a particular hypothesis to be tested. Even when a particu- 
lar hypothesis is suggested, the statistician may wish to determine 
what hypotheses in general are acceptable and what are not, or to 
determine the single hypothesis which is best supported by a sample 
in hand. In these situations, the sample statistics are used to estimate 
probable values of the corresponding population parameters. 

There are two problems in statistical estimation. One of these is 
concerned with finding a single value which can be considered the 
“best” estimate of a parameter that can be made from a given 
sample statistic. This is the problem of point estimation. In modern 
statistical theory, this "best" estimate is the one determined by 
Fisher's method of maximum likelihood. (See Refs. 5 and 6.) It is 
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generally true that the maximum likelihood estimate of a parameter 
is either the value of the corresponding sample statistic or a value 
readily obtained from it. For example, the best single value or 
point estimate of the mean of a normal population is the mean of 
the sample; the best point estimate of the standard deviation of the 
normal population, independent of the mean, is the standard devia- 
tion of the sample multiplied by ~/N/(N —1). 

Point estimates, although of great interest and importance in 
statistical theory, are not very meaningful in the practical situation 
unless accompanied by a statement regarding possible error in the 
estimates. For this reason, interval estimation, rather than point 
estimation, is ordinarily employed in practice. The problem of 
interval estimation is that of determining an interval or range of 
values which has a specified probability of covering the parameter 
in question. If the probability is .95 that the value of the parameter 
is in fact covered by the interval, the interval is designated the 95 
per cent confidence interval and the end points of the interval the 
95 per cent confidence limits of the parameter. The 99 per cent con- 
fidence limits are the end points of the interval having a .99 proba- 
bility of covering the value of the parameter. 

As a rule, given the value of a sample statistic, the lower limit of, 
say, the 95 per cent confidence interval of the corresponding рагат- 
eter is found by determining a value of the parameter such that the 
probability of a sample value equal to or greater than the one ob- 
served is .025. The upper limit is found by determining a value of 
the parameter such that the probability of a sample value equal lo 
or less than the one observed is .025. Over the long run, the interval 
thus determined must include the parameter 95 per cent of the time. 
Tt follows, of course, that any hypothesis proposing a value of the 
parameter not included by the 95 per cent confidence interval can 
be rejected at least at the 5 per cent level of significance, while one 
proposing a value not included by the 99 per cent confidence interval 
can be rejected at least at the 1 per cent level of significance; and 
so оп. 

Ordinarily, either a 95 per cent or a 99 per cent confidence interval 
is estimated, although in some situations it is desirable to determine 
both. Intervals having less or greater probability of covering the 
value of the parameter can, of course, be estimated. The disadvan- 
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tage of high confidence intervals is that, for fixed sample size, such 
intervals are relatively wide and hence lacking in precision. In 
practice, the statistician ordinarily is willing to sacrifice some confi- 
dence in order to obtain a closer estimate. For this reason, the 
95 per cent interval is the one most frequently employed. Much 
depends, of course, upon the nature of the problem. If it is extremely 
important that the interval cover the value of the parameter, a 
high confidence interval, perhaps one as high as 99.9 per cent, 
would be estimated, 

As pointed out previously, the variability or spread of the sam- 
pling distribution decreases as the size of the sample increases. As a 
result, any particular confidence interval is narrowed as sample size 
increases. In other words, the larger a sample the more precise the 
estimate it affords, at no expense to confidence. 

We have already made some use of estimation techniques. In ` 
judging the reliability of predicted criterion scores (pp. 283-286) 
and the reliability of obtained scores (pp. 349-350) we really deter- 
mined ranges which had a specified probability of including the true 
values. There we were concerned with confidence intervals of single 
true scores, rather than with confidence intervals of parameters, 
but the underlying idea is the same. In the sections following we 
shall make further application of estimation techniques. 

Significance and Reliability of Sample Statistics. In prac- 
tical research it is desirable to distinguish between the significance 
and the reliability of sample statistics. As used in research, the term 
significance is limited to mean the probable existence of a nonchance 
difference between observation and expectation or between fact 
and hypothesis. Tests of significance relate to the single question: 
Is it reasonable lo suppose that an observed difference has occurred 
as a result of random fluctuations of sampling? The question typically 
is answered by testing the null hypothesis. A difference is significant 
or not significant depending upon whether it cannot be or can be 
reasonably attributed to sampling fluctuations or chance. The 
probability that an observed difference could have arisen due to 
chance gives objective meaning to the term significance. 

The reliability of a statistic, such as a mean or a measure of rela- 
tionship, depends upon the extent to which the statistic can be 
expected to fluctuate in successive similar samples from the same 
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population. The less the expected fluctuation, the more reliable the 
sample statistic is as an estimate of the corresponding population 
parameter. The reliability of a statistic usually is described in terms 
of confidence intervals. 

Unless it has statistical significance, of course, sample information 
is not reliable, since, for all we can tell, chance may have operated 
to produce it. Sample information, however, can be statistically 
significant without possessing sufficient reliability to permit service- 
able estimates or useful prediction. This was pointed out earlier 
in connection with multiple correlation and regression (р. 314). 
Small sample information, in particular, usually is of low reliability. 
There is no exception to the rule that, other things being equal, 
the larger the sample the more reliable the information it provides. 

Sometimes the attempt is made to distinguish between the sta- 
tistical significance and the practical significance or utility of sample 
information, the latter being judged in light of the practical con- 
sequences of the information. The distinction is both difficult to 
make and insubstantial. In the main, significant information sooner 
or later contributes to understanding and control of the environ- 
ment, if only because it singles out nonchance relationships or con- 
nects favorably or unfavorably with accepted theories. Moreover, 
it is frequently the case that significant information, when sample 
size is increased, becomes sufficiently reliable to permit useful estima- 
tion and prediction. 

Concluding Remarks. Statistical inference is concerned either 
with testing hypotheses or with estimation. In order to use a sample 
statistic in testing an hypothesis about the value of the correspond- 
ing population parameter or in estimating the probable value of the 
parameter, we must first of all know the sampling distribution of 
the statistic. The general ideas underlying the use of the sampling 
distribution have been discussed in this section; the application of 
these ideas to the binomial, the normal, the /, the x?, and the F 
sampling distributions will be illustrated in the following sections. 


Exercises 


1. A principal interviewed all of the students who dropped out of school 
during a certain year, hoping thereby to determine what conditions 
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needed to be corrected in order to decrease the number of dropouts. 
He claimed that, since he interviewed all dropouts, he had no sampling 
problem. Do you agree? Tf not, what is the population of concern? 
Under what conditions can the principal make inferences about this 
population? 


. If you had to choose between a small random sample and a large non- 


random one, which would you take? Why? 


‚ What are the two broad questions with which statistical inference is 


concerned? In what way are these questions related? 


. In order to be certain regarding the form of a population distribution 


or the values of its parameters, what would one have to do? 


. An urn contains 4 red balls and 6 black ones. What is the probability 


that a ball drawn at random will be red? 


. In terms of relative frequency, what does it mean to say (a) that the 


probability of drawing a red ball from an urn is .4, (b) that in rolling 
a single die the probability of а 5-spot is 1/6, (c) that the probability 
that a man of given age will live to be 65 is .3? 


. According to its classical definition, probability is the ratio of the 


number of “favorable” cases to the total number of equally likely 
cases. In what way is the definition circular? 

listribution of worker accidents in an industrial plant during a 
certain period was: 


NUMBER OF ACCIDENTS NUMBER OF 
PER WORKER WORKERS 


Anewnyes 
е 


(a) What is the relative frequency of workers who had more than 1 ас- 
cident during the period? (b) What is the probability that a worker 
picked at random will һауе had more than 1 accident during the year? 
If a single score is drawn at random from a normal distribution of 
scores, what is the probability that it will fall in the interval M + 10? 
Suppose you roll a pair of unbiased dice once. What is the probability 
of obtaining 10 or more spots? 

If you tossed 4 unbiased coins 16 times, how many times would you 


St 


13. 


14, 


16. 


17. 


18. 


19. 


20. 
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expect to obtain 2 heads and 2 tails? What is the probability of 2 heads 
and 2 tails on a single toss of 4 coins? (Ans. 3/8.) 

. If the books of a three-volume set are placed at random on a shelf, 

what is the probability that they will be in correct order? (Ans. 1/6.) 

An urn contains 2 white and 3 black balls. Two balls are drawn at 

random and replaced by red balls. If then 2 balls are drawn at random, 

what is the probability that they are of the same color? (Ans. .22.) 

Berlrand's box paradox: Three boxes have in them 2 coins each. In one 

box, both coins are gold, in one both are silver, in the other they are 

mixed. Outside, the boxes are of identical appearance. A man chooses 

a box and takes out a coin which proves to be gold. What is the chance 

that the other coin in his box is also gold? (Ans. 2/3.) 

. An investigator determined the mean IQ in a random sample of 100 

twelve-year old children of foreign-born parents in a certain city. 

Identify the population, the sample, and the sampling distribution in 

this situation. 

Suppose you were asked to determine experimentally the sampling 

distribution in (15) above. How would you proceed? 

Suppose it is known that the sampling distribution of the mean of 

samples of size N from а given population is normal with a mean of 

10.00 and a standard deviation of 1.00. What is the relative frequency 

or probability of samples having a mean of 11 or more? А mean of 

12 or more? А mean between 9 and 11? If 1,000 samples of size М were 

taken from this population, how many would you expect to have a 

mean of 9 or less? 

What is a null hypothesis? What does it mean to reject a null hypothe- 

sis? To accept one? 

What are the limits of probability figures? Can you think of an event 

characterized by a probability figure of 0? Of 1? 

a. What are the two types of mistakes or errors one may make in 

testing hypotheses? 

b. How can the risk of making either type of error be made very small? 

c. The phrase "level of significance" is associated with the risk of 

making which type of error? 

. The three most widely used sets of rejection and acceptance are shown 
in Fig. 8.2. Various other sets are of course possible. For example, 
one might employ a 5 per cent region of rejection comprising 1 per cent 
of the area in the left tail and 4 per cent in the right tail. When might 
such a set be appropriate? s 

. Distinguish between point estimation and interval estimation. Why is 
the latter ordinarily preferred in practice? 

. Distinguish between the significance and the reliability of sample data. 
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The Binomial Sampling Distribution* 


The binomial sampling distribution has several interesting 
features. It is the oldest known sampling distribution; it rests upon 
a straightforward and easily followed application of elementary 
probability and sampling theory; and it is an exact sampling dis- 
tribution. Although some of the problems to which the distribution 
is applicable can better be dealt with by the chi square technique, 
the distribution remains of considerable theoretical interest and 
practical importance. 

The binomial distribution arises in the sampling of attributes. 
Consider a population such that each member has or does not 
have a given attribute A. Such populations are very common in 
research. As examples, we may cite a specified group of voters who 
are for or are not for a certain proposal or a certain political candi- 
date; youth who have been or have not been brought before a 
juvenile court; individuals who are Protestant or non-Protestant; 
adults who did or did not attend college; manufactured articles 
which are or are not defective; hospital patients who did or did 
not have a given treatment; and examinees who did or did not pass 
a given test item. Such populations are commonly referred to as 
twofold populations. In all such populations every member is char- 
acterized by the presence or the absence of the attribute under 
consideration. The problem which most commonly arises in in- 
vestigating twofold populations is that of drawing inferences from 
sample evidence about the proportion of individuals in the popu- 
lation who have the given attribute. Before dealing with that 
problem we shall need to examine certain properties of the binomial 
distribution, 

The Theoretical Binomial Distribution. As a concrete ap- 
proach to the binomial sampling distribution, let us consider a 
twofold population comprising adults in a specified region who are 
in favor of or opposed to universal military training. We shall 
designate the attribute, “іп favor of universal military training,” 
as A; “opposed” (absence of the attribute) as Z; the proportion 


* If it is desired to treat the sampling distribution of a proportion as a normal 
distribution with mean f and variance 04//У, or to utilize the x? distribution 
in making inferences about the twofold population, this section may be omitted 
with little loss of continuity. 
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of A's in the population as ў, and the proportion of 2/8 as ў, so that 
р-+4 = 1; and we shall suppose that the A's and Z's are equal in 
the population, so that p = ў = 1/2. The fact that we never know 
p or ĝ in a real sampling problem is immaterial to our development 
of the theoretical distribution. 

Let us first determine the possible frequencies of A's in samples 
of various sizes from the population. In a random sample of 2; 
the possibilities are that (1) both are A’s; (2) the first is A, 
the second Z; (3) the first is Z, the second A; and (4) both 
are Z's. If the population is so large in comparison to the size 
of the sample that f (and 4) are, for practical purposes, constant, 
the relative frequency or probability of each of the four occurrences 
is, by the multiplication rule, 1/2 X 1/2 or 1/4. Hence, the proba- 
bility of 2 A's is 1/4; the probability of 1 A is, by the addition rule, 
1/4 - 1/4 or 1/2; and the probability of no A's is 1/4. If we took 
many random samples of 2 each, we would expect 2 A's, 1 A, and 
0 A's to appear in the ratio 1:2:1. 

In a sample of 3, the possibilities are AAA, AAZ, AZA, ZAA, 
AZZ, ZAZ, ZZA, and ZZZ. Classifying the eight possibilities 
according to the number of A's we have 


NUMBER RELATIVE 
or A's FREQUENCY FREQUENCY 
3 1 1/8 
2 3 3/8 
1 3 3/8 
0 1 1/8 


If we took many random samples of 3 each, we would expect 3 A's, 
2 A's, 1 A, and 0 A's to appear in the ratio 1:3:3:1. 

In a sample of 4, the possibilities are AAAA, AAAZ, AAZA, 
AZAA, ZAAA, AAZZ, AZZA, ZZAA, AZAZ, ZAZA, ZAAZ, 
AZZZ, ZAZZ, 22А2, 222А, ZZZZ. Classifying these we have 


NUMBER RELATIVE 
or A's FREQUENCY FREQUENCY 
4 H 1/16 


3 4 4/16 
2 6 6/16 
1 4 4/16 
0 1 1/16 
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as the distribution we would expect the number of A’s in many 
samples of 4 each to follow. 

Since the number of possible outcomes is doubled by each unit 
increase in sample size, it becomes progressively more difficult to 
list them. It can be demonstrated mathematically, however, that 
the possibilities and relative frequencies in samples of 5, 6, and 7 
each from a population in which р = ĝ = 1/2 are: 


SAMPLE OF 5 SAMPLE OF 6 SAMPLE OF 7 
NUMBER RELATIVE NUMBER RELATIVE NUMBER RELATIVE 

or A's FREQUENCY or A's FREQUENCY or A's FREQUENCY 

5 6 1/64 " 1/128 

4 5 6/64 6 7/128 

3 4 15/64 5 21/128 

2 3 20/64 4 35/128 

1 2 15/64 3 35/128 

0 1 6/64 2 21/128 

0 1/64 1 7/128 

0 1/128 


and that the relative frequency of N, М – 1, № – 2, ..., 2, 1, 
0 A's in samples of size № are given by the successive terms of the 
expanded binomial (1/2 4- 1/2)*. 

The argument may readily be generalized to sampling situations 
in which f (and 4) have positive values other than 1/2, provided 
p + @ = 1. In other words: 


Given a twofold population in which the proportion of individuals 
having attribule А is р and the proportion of individuals having the 
attribute “not-A” is d (= 1 — f). Let the population be of sufficient 
size so that the removal of N individuals does not materially change f. 
In random samples of N individuals, the relative frequencies of N, N — 1, 
N—2,..., 2, 1, 0 A's are given by the successive terms of the bi- 
nomial expansion (р + 4)”. 


It follows, of course, from the relative frequency definition of proba- 
bility that the exact probability that a single random sample will 
contain a specified number of A's is given by the appropriate term 
of the expansion. This term is the term in which the exponent of р 
equals the specified number. We shall return to this in connection 
with testing hypotheses about the twofold population. 

It will be recalled from algebra that the expansion of the general 
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binomial (р + Q)" is 
о -- ТА LANS Vy m 
(8 + @* = t М Что) P 4 
ММ — 1)(N - 2) 
(1) (2) (3) 


The expansion contains N +1 terms* and terminates in 0У, 4У 
being the relative frequency or probability of a random sample of 
size N containing no A's. If we arrange the expansion in a fre- 
quency distribution, we shall have the general binomial distribution: 


pe + . . + Qu. 


NUMBER RELATIVE 
or A's FREQUENCY 
N py 
N-1 Nj 
МҮЛУ eng 
E “wey did 
ММ — DAN — 2) pw- 
NH N(N -DN = 2 „ 
NES mom ^^ 
0 [4 


The mean, standard deviation, and alpha statistics of the general 
binomial distribution are: 


М = №, 
à = V №4, 
4-) 
в = IE, (8.1) 
` Ум 
1 6 
acna МТ” 


Аза numerical check, the student сап compute М, о, a, and o4 of 
the theoretical distribution for N of 5, 6, or 7 and compare the 
computed values with those given by formulas (8.1). 
Experiments in the Sampling of Attributes. The preceding 
discussion of the binomial distribution was largely theoretical. Let 


* The coefficient of any term of the expansion, if multiplied by the exponent 
of p and divided by the number of the term, gives the coefficient of the next 
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us now see whether the frequencies of an attribute A in random 
samples actually drawn from a twofold population tend to corre- 
spond to the frequencies of the theoretical distribution. 

А twofold population is simulated in Table Ш, Appendix B, 
one-half of the 400 individuals having attribute A and one-half 
having “not-A.” By use of random numbers, 170 samples of 10 
each were selected from the population, each individual being 
utilized as many times as his serial number occurred. (In effect, 
this procedure makes our population infinitely large, so that the 
probability of drawing an А individual remains constant.) The 
distribution of the number of A's in the successive samples of 10 is 
shown in Table 8.1. The theoretical relative frequencies were 
obtained from the terms of the expansion (1/2 + 1/2)!°, and the 
theoretical absolute frequencies were obtained by multiplying the 
relative frequencies by 170. The histograms of observed and theo- 
retical distributions are shown in Fig. 8.4. 

The observed and expected frequencies are in fair agreement, as 
are the observed and expected statistics, shown at the foot of 
Table 8.1. 


term. Thus, if we multiply the coefficient of the fourth term by N — 3 and 
divide by 4 we shall obtain as the coefficient of the fifth term 


ММ — 1)(N — 2)(N — 3) 
(1) (2) (3) (4) 


The numerical values of the coefficients for №5 from 2 to 12 are given in the 
following chart. 


N COEFFICIENTS 

2013172: 71 

Ox 3089 v1 

4-1. 4 6 4 1 

5: 10 010 5 L 

6.1 6 15 20 15 6 1 

"Lar 3b 35: 21 7 1 

Cred, x5 281-56 10. 50 28 8 1 

9 1 9 36 84 126 126 84 36 9 1 

10 1 10 45 120 210 252 210 120 45 10 1 
11 1 11 55 165 330 462 462 330 165 55 1 1 
12 1 12 66 220 495 192 924 192 495 220 66 12 1 


Тһе extension of the chart for values of № up to about 20 is not difficult. Any 
number plus the number on its left gives the number just below. 


ТАВГЕ 8.1 
OBSERVED AND THEORETICAL FREQUENCIES OF AN 
ATTRIBUTE A IN 170 SAMPLES OF 10 EACH FROM A 
POPULATION IN WHICH THE PROPORTION OF 4s 
IS .5 


шм M к= с =————= 


NUMBER OBSERVED THEORETICAL FREQUENCY 
or A’s FREQUENCY RELATIVE ABSOLUTE 
10 3 .001 „2 
9 1 . 010 X 
8 11 . 044 7.5 
7 13 117 19.9. 
6 39 .205 34.8 
5 42 ‚246 41.8 
4 36 .205 34.8 
3 17 117 19.9 
2 5 . 044 7.5 
1 3 . 010 T1 
0 0 ‚001 .2 
SUM 170 1.000 170.0 
M 5.12 5,00 
c 1.67 1.58 
өз .26 0.0 
en 3.16 2.80 
и 
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Fig. 8.4. Histograms of observed (—) and theo- 
retical (- — —) sampling distributions. (From Table 


8.1.) 
399 
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Many such experiments in the sampling of attributes have been 
performed, and the observed frequencies have generally been found 
to be in good agreement with the frequencies expected under the 
theoretical binomial distribution. It may be concluded that both 
theory and experience indicate that the sampling distribution of 
attributes follows the binomial pattern. 

Use of the Normal Curve in Approximating Sums of Terms 
of the Binomial. We have noted that the successive terms of the 
expansion of the binomial (f + Q)" give the probabilities of N, 
М-1,М-2,...,2,1,0 А?віп samples of size N from a twofold 
population in which the proportion of A's is f. The numerical 
value of any term is the exact probability of a random sample of 
size № containing the attribute А in number equal to the exponent 
of f in that term. 

Let us make use of this fact in determining the probability of 
obtaining 12, 11, 10, or 9 A's in а random sample of 12 from a 
population in which the proportion f of A's is .4. 'The first term 
of the expansion (.4 + .6)!° gives the probability of 12 A's, the 
second term the probability of 11 A's, the third of 10 A’s, and the 
fourth of 9 A's. Then by the addition rule, the probability P of a 
sample of 12, 11, 10, or 9 A's is 


Җ T Е уда 
P = (4) + 12(4)'(.6) ^r (.4)15(.6) 
(12)(11) (10) 


t006) 


(.4)°(.6)°. 


By the use of logarithms we obtain, to three-figure accuracy, 
Р = .0153. 

It is always possible to determine the probability of samples con- 
taining more than or less than a specified number of A’s by summing 
the appropriate terms of the binomial expansion. The amount of 
labor involved, however, practically prohibits the procedure, except 
in quite small samples. Fortunately, the normal curve usually can 
be used to estimate satisfactorily the sum of specified successive 
terms of the expansion. 
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The frequency distribution of the binomial (4 + .6) is: 


NUMBER RELATIVE 
oF A’s FREQUENCY 
12 . 0000 
11 .0003 
10 .0025 
9 .0125 
8 .0420 
7 ‚1009 
6 .1766 
5 .2270 
4 .2128 Ы 

3 ‚1419 
2 . 0639 
1 . 0174 
0 ‚0022 


зом 1.0000 


The histogram of the distribution and a superimposed normal 
curve are shown in Fig. 8.5. Since the areas of the rectangles of 
the histogram correspond to the respective relative frequencies, 


Relative frequency 
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Fig. 8.5. Histogram of binomial: (.4 + .6)? and 
superimposed normal curve. 
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the total area of the histogram corresponds to unity. Assuming the 
distribution normal, we may use the area relationships of Table 
A, Appendix C, to approximate the sum of the areas of the four 
right-hand rectangles of the histogram. The procedure is quite 
similar to that described previously on pp. 196-197. By formulas 
(8.1) the mean of the distribution is 12(.4) or 4.8, and the standard 
deviation 4/12(.4)(.6) or 1.7. Now in order to treat the distribution 
as continuous, we must consider the rectangles of the histogram 
to extend 1/2 unit below and 1/2 unit above the indicated integral 
values shown at the base of the histogram in the illustration. 
Hence, the left boundary of the area of the four right-hand rec- 
tangles is located at 8.5, as shown in the figure. In standard devi- 
ation units, the distance between 8.5 and the mean is (8.5 — 4.8)/1.7 
or 2.18. Entering Table А at 2.18, we read .4854. Subtracting .4854 
from .5000 we obtain .0146 as the area of the four right-hand rec- 
tangles. Since .0146 is the approximate sum of the first four terms 
of the expansion (.4 + .6)!* it is the approximate probability that, 
in the given situation, a sample of 12 will contain 12, 11, 10, or 9 
(i.e., 9 or more) individuals having the attribute in question. This 
is in quite good agreement with the probability figure we obtained 
by actually adding the numerical values of the first four terms of 
the expansion. 

The normal curve areas may be used to approximate the value 
of any single term of the binomial. For example, consider the ninth 
term of the binomial (.4 + .6)"%. The rectangle of the histogram 
which corresponds to this term extends from 3.5 to 4.5 or in standard 
deviation units from about —.76 to —.18. From Table A, we find 
the area subtended by this interval to be .2050. The computed 
value of the ninth term is .2128. 

'T'he use of the normal curve in summing successive terms or in 
finding the value of a single term of a binomial distribution results 
in an approximation. When the distribution is sensibly normal, the 
approximation is good; the more the distribution departs from 
normality, the less accurate the approximation becomes. Consider, 
for example, the values of the last four terms of the distribution 
(1 + .9)?*, as determined by normal curve procedure. The mean of 
the distribution is 2 and the standard deviation is 1.34. The bound- 
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aries of the class intervals or rectangles of concern are, in standard 
units, 1.12, .41, —.41, and —1.12. Hence, the values of the last 
four terms, obtained from normal areas, are about .22, .29, .22, and 
13, respectively. The corresponding computed values are about .19, 
.28, .27, and .12. The reason why the approximations tend to be 
only fair is seen when we inspect the frequency polygon for N = 20 
in Fig. 8.6. However, the approximations are close enough to pro- 
vide a rough check on the computed values, as is usually the case. 
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Fig. 8.6. Frequency polygons of the binomial 
(Л + .9)N for N of 10, 20, and 50. 


The shape of the binomial distribution depends upon the values 
of f, 4 and N. If f = 4 the distribution is symmetrical; if f 7 4 
the distribution is skew. Examination of formulas (8.1) indicates 
that, for given values of f and å, à; approaches 0 and 4) approaches 3 
as N becomes increasingly large. If N is infinitely large, @ = 0 
and á, = 3. It can be shown that, under certain conditions of 
approximation, the binomial approaches the normal curve as a 
limit as N approaches infinity. (See exr. 2, p. 192.) 

In the practical situation, we never have an infinitely large №; 
hence, the normal curve can never be said to exactly fit the binomial 
distribution, Nonetheless the fit usually is close enough to enable 
one to make satisfactory approximations by normal сигуе рго- 
cedure. In judging whether the use of the normal curve is justified 
in summing specified terms of the distribution (f + 4)", the student 
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will find the following thumb rule quite dependable: 


If the product of № times р or ў (whichever is smaller) is greater than 5, 
the use of the normal curve is justified. If № times f or ў is equal to or 
less than 5, but № equal to or greater than 10 and @ less than .2, the 
normal curve still ordinarily gives sufficiently accurate results to be 
justified. 


The rule, like any thumb rule, needs to be applied with care. 
Approximations do not suddenly become acceptable or unaccept- 
able, as f, ў, and N take on different values. Moreover, approxima- 
tions are better in some regions than in others of a given binomial. 
However, we usually are interested in the sum of the terms beyond 
a point some 2 or 2.5g units from the mean. When this is the case, 
the rule tends to be a safe guide. 

Testing Hypotheses about a Population Proportion. Up to 
this point our discussion of the binomial sampling distribution has 
been deductive. We began with populations in which f was known 
and deduced the relative frequency or probability of various 
samples. In using the distribution to draw inferences about a popu- 
lation, we must reverse the argument. We must begin with a sample 
and must reason from the sample to the population. 

Let us approach the testing of hypothesis by reference to a con- 
crete problem. In attempting to determine the extent of need for a 
tuition-free city college, the department of education in a large 
city took a random sample of 90 from the group of graduating 
seniors qualified for college. The population was considered to be 
the graduating seniors of the present and future years qualified for 
college, and the sample of 90 was considered to be a random sample 
from this population. A careful interview with each member of the 
sample and his parents disclosed that 21 of the 90 would attend a 
tuition-free city college if one were available. Is this evidence 
consistent with the statement made by an organization in favor of 
a city college that one qualified high school graduate in three would 
attend such a college? In statistical language, is the hypothesis 
p = 1/3 tenable? 

As always, we begin the test by assuming the hypothesis to be 
true. If f. = 1/3, the sampling distribution of concern is the binomial 
(1/3 + 2/3), with mean 30 and standard deviation 4.47. The 
question now is, does the observed frequency 21 differ sufficiently 
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from 30, the frequency expected if the hypothesis is true, to dis- 
. credit the hypothesis? The difference between observed and ex- 
pected frequencies is, disregarding sign, 9. In order to use the 
normal curve in summing the terms of the binomial which differ 
from 30 by 9 or more, we subtract 1/2 from 9 and divide by the 
standard deviation. (Why?) This gives us, in standard deviation 
units, (9 — .5)/4.47 or 1.90. Since the deviation 1.90 falls short of 
significance at the 5 per cent level, the hypothesis ӯ = 1/3 cannot 
be rejected at that level. (See [0], Fig. 8.7.) The probability of an 


Region of 


Region of 
acceptance 


acceptance 


(а) (b) 
005 0025,4 0,025 
E 24 27 30 33 36 39 4218 2| 24 27 30 33 36 9 42 
-1.900 — |— 1.900 —— +1.900— 


Fig. 8.7. Two .05 regions of rejection in the binomial sampling dis- 
tribution (14 + 24)". 


absolute deviation of 1.90 or more is, from the table of normal 
areas, .029 + .029 or about .06. 

In testing the hypothesis f. = 1/3 (or any other exact hypothesis), 
the (b) region of rejection in Fig. 8.7 ordinarily is used, since it 
guards equally well against the two alternatives, р < 1/3 or 
D > 1/3, one of which must obtain if the hypothesis f = 1/3 is in 
fact false. (Pp. 385-388.) In this problem, however, there is a second 
plausible procedure. The hypothesis that f is as great as, 1.е., not 
less than, 1/3 may be of first concern. To test the hypothesis 
D > 1/3, the (a) region of rejection of Fig. 8.7 is appropriate, since 
it minimizes the risk of accepting the hypothesis if f is in fact less 
than 1/3.* The difference in standard deviation units between 

* To test the hypothesis р < 1/3 at the 5 per cent level, the deviation would 


be referred to a 5 per cent region of rejection located at the right-hand tail. 
(See [c], Fig. 8.2.) This test would be made, of course, only if the deviation 


between observed and expected frequencies were positive in sign. 
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observed and expected frequencies is, retaining the algebraic sign, 
—1.90. This falls well within the (a) region of rejection of Fig. 8.7, 
and consequently the hypothesis р > 1/3 can be rejected at the 
5 per cent level. The probability of deviation as small as —1.90c is, 
from the table of normal areas, about .03, so that actually the 
hypothesis can be rejected at the 3 per cent level. 

If the 1 per cent level of significance were employed, the bounding 
ordinate of the (a) arrangement of Fig. 8.7 would be at —2.33c, 
and the bounding ordinates of the (b) arrangement at —2.580 and 
+2.58c. Other levels of significance might of course be charted. 

Formulas and Procedures. The above discussion may be sum- 
marized as steps in a procedure. First, let us generalize the compu- 
tations we carried out in a formula, using the symbol CR (critical 
ratio), to designate the difference in standard units between observed 
and theoretical frequencies. We may write 


CR - li ы? (8.2) 

м № 
in which f, and f, are the observed and theoretical frequencies, 
respectively; N, as always, the number in the sample; р the hy- 
pothesized true proportion and 4 = 1 — f. 

If preferred, the proportion of individuals in the sample having 
the attribute may be utilized instead of the absolute frequency or 
number of individuals. If the sample proportion p is used, formula 
(8.2) becomes 


CR 21р — 8i — 1/2. 
VIN 


since to change to the proportions unit, the absolute frequency must 
be divided by N. In the illustrative example, p = 21/90 and 
D = 1/3 so that by (8.27) 


(8.2’) 


cr = 121/90 = 1/3| — 1/180. 
м (1/3) (2/3)/90 


From this we obtain 1.90, as before. 
Since multiplying by 100 turns proportions into percentages, in 
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the percentage unit formula (8.2’) becomes 
|Р — P| — 50/N 
in which the capital letters P and Q indicate percentages. 


The procedures in testing an hypothesis about a population pro- 
portion may be conveniently summarized in the following directions: 


СЕ = (8.2”) 


а. Take а random sample from the specified twofold population 
and determine the absolute frequency, the proportion, or the 
percentage of individuals who have the attribute in question. 

b. State the hypothesis to be tested. If the hypothesis can be stated 
at least roughly before the sample is drawn, one can make sure 
that the sample is of sufficient size to justify the use of the normal 
curve areas in approximating sums of terms of the binomial. 

c. Compute the statistic CR by formula (8.2) if the absolute fre- 
quency unit is employed; by (8.27) if the proportions unit is 
employed; or by (8.2’’) if percentages аге employed. 

d. If |CR| is equal to.or greater than 1.96, the exact hypothesis that 
р is equal to the proposed value can be rejected at the 5 per cent 
level of significance. If |CR| is equal to or greater than 2.58, the 
hypothesis can be rejected at the 1 per cent level. In practice, 
CR usually is referred to a table of normal areas, and the area 
lying beyond the ordinate at the value of CR is doubled. This 
gives the probability P of a sample proportion deviating from 
the hypothesized proportion by as much as the observed value 
of GR and indicates the level of significance of the deviation. 

e. If CR (returning to algebraic sign after the correction —1/2, 
—1/2N, or —50/N is made) is equal to or less than — 1.64, the 
hypothesis that 2 is equal to or greater than the proposed value 
can be rejected at the 5 per cent level of significance. If CR is 
equal to or less than —2.33, the hypothesis can be rejected at 
the 1 per cent level. The definite probability figure is given by 
the area under the normal curve lying to the left of the ordinate 
at the value of CR and indicates the level of significance. 

f. If CR is algebraically equal to or greater than +1.64, the hy- 
pothesis that р is equal to or less than the proposed value can 
be rejected at the 5 per cent level of significance. If CR is equal 
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to or greater than +2.33, the hypothesis can be rejected at the 
1 per cent level. The definite probability figure is given by the 
area under the normal curve lying to the right of the ordinate 
at the value of CR. 


If the use of the normal curve in approximating the binomial 
distribution is not justified, hypotheses can be tested only by ac- 
tually summing the appropriate terms of the binomial. This may 
be so laborious as to be practically prohibitive; however, if № is 
less than 50 and р and ў are no finer than hundredths, the terms 
may readily be summed by using the tables of the binomial proba- 
bility distribution published by the National Bureau of Standards. 

Estimation of a Population Proportion from a Sample 
Proportion. It frequently happens that the investigator is inter- 
ested in estimating р, rather than in testing a definite hypothesis. 

The best single-value or point estimate of р is the sample pro- 
portion p, but, as was pointed out earlier, a point estimate is neither 
very useful nor very meaningful without some indication of the 
possible error in the estimate. In practical work, it usually is desir- 
able to determine the 95 or 99 per cent confidence limits of р, and 
thus to establish a confidence interval. 

Let us return to the illustrative problem above in which p — .23 
and NV = 90 to see how the 95 per cent confidence limits of the 
population proportion р would be determined. We shall designate 
the lower limit p; and the upper py. The value of p; must be deter- 
mined such that it would make the probability of getting the given 
sample value .23 or more just equal to .025, and the value of ри 
must be determined such that it would make the probability of 
getting the given sample value .23 or less just equal to .025. Since 
the distribution is not independent of f, two distributions must be 
considered if p; and ри are to be determined exactly. This is а 
matter of some complexity (see Ref. 21), and in practice an approxi- 
mation to the values of p; and ро would ordinarily be made from 
the single distribution (.23 — .77)**, in which the sample proportions 
rather than the theoretical proportions are used. The mean of this 
distribution, in the proportions unit, is .21 and the standard devia- 
tion is 4/.23 X .77/90 or .044. The distribution is well approximated 
by the normal curve; hence, the ordinates at .21 + 1.96(.044) 
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or .12 and .30 include about 95 per cent of the area. The values 
.12 and .30 are therefore the lower and upper 95 per cent confidence 
limits of the population proportion р. Since these limits are only 
approximate, the usual correction for the discontinuity of the 
binomial distribution is unwarranted. 

The interval .12-.30 is approximately the 95 per cent confidence 
interval. Any hypothesis that puts р outside this interval may be 
rejected at least at about the 5 per cent level of significance. This is 
the connection between interval estimation and testing hypothesis, 
previously noted. However, a definite hypothesis concerning f 
ordinarily would be tested by the usual method and a definite proba- 
bility figure thus obtained. 

In general, when / is large enough to justify the use of the normal 
curve as an approximation to the binomial distribution, the lower 
and upper 95 per cent confidence limits of f are satisfactorily ap- 
proximated by the formulas 


“he (8.3) 


in which pz is the lower limit of р, ри the upper limit, р the sample 
proportion, and № the number in the sample. 

The approximate 99 per cent confidence limits may be obtained 
by replacing 1.96 with 2.58 in formulas (8.3); the 90 per cent con- 
fidence limits by replacing with 1.64; and so on. 

The Sign Test of Significance of Differences between Paired 
Data. There are a great many situations in which data are observed 
in pairs. In a single group experiment, for example, each member 
of the group is measured before and after some experimental treat- 
ment. In the parallel group experiment, each member of an experi- 
mental group may be matched with an individual in a control 
group and measurements made on the pairs. If we designate one 
set of measures X; and the other Y,, we can form a series of N differ- 
ences, (X; — Yi), (Xs — Yə), . . . , (Xv — Yx). 

If each of these differences is distributed about a median (or 
mean) of zero, we would expect sampling fluctuations to result in 
positive and negative differences in about the same number. If, 


ж. 
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however, the various (X; — У.) are not distributed about zero, 
i.e., if there are real differences in one direction, either positive or 
negative differences will predominate, depending upon the direction. 

Under the null hypothesis, the signs of the differences (X; — Y;) 
will be distributed according to the binomial model (1/2 + 1/2)", 
and the terms of the expansion will give the probability of N, 
N—1N-—2,...,2,1, 0 plus (or minus) signs. 

We shall illustrate the application of the sign test to data taken 
from Kuebler's study of the validity of paper-and-pencil personality 
inventories (Ref. 11). Kuebler administrated an inventory to a 
group of 22 private school students, requesting that the inventories 
be returned unsigned. By hidden identification marks, he was able 
to match the unsigned inventories with the inventories administered 
a few days later, the latter being administered with the usual request 
for signatures. The numbers of “maladjustment” items checked 
in the two situations and the signs of the differences are shown 
below. 


NUMBER OF ITEMS CHECKED 


UNSIGNED SIGNED SIGN OF DIFFERENCE 
37 23 E 
44 34 + 
55 59 i 
70 25 + 
26 16 + 
39 12 + 
26 16 zb 
30 25 + 
85 60 о 
83 69 + 
14 14 0 
36 36 0 
59 70 = 
59 6l = 
36 29 n 
81 70 + 
58 52 + 
19 34 = 
11 1 + 
45 42 se 
14 14 0 


19 13 
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Kuebler considered the 22 subjects to be a random sample from 
the population of present and future students in his school to whom 
the inventory might be given. How sound is the hypothesis that 
there is no real difference between results obtainable by unsigned 
and signed administrations of the inventory in the population? 

There are 19 nonzero differences, 15 plus and 4 minus. Assuming 
that the hypothesis is true, the signs of the 19 differences would be 
distributed according to the model (1/2 + 1/2)”. The mean of 
this distribution is 9.5, and the standard deviation is 2.18. Hence, 
the difference between observation and expectation in standard 
units is, by formula (8.2), 


Since this value falls in the 5 per cent region of rejection, the hy- 
pothesis is rejected. To obtain a definite probability figure we enter 
the table of normal areas at’ 2.29 and, doubling the corresponding 
area, we have Р = .02. № can be concluded that future signed and 
unsigned administrations of the inventory in question would very 
likely yield different results. 

It is sometimes informative to determine the probability that 
paired data differ by k units. When this is desired, the signs of the 
differences Ха — (У, + k), Xs — (Ys E), ..., Xy — (Ух + №) 
аге determined and the sign test applied as usual. In the illustrative 
example, one might ask whether, in the population, it is reasonable 
to suppose that the number of items checked in the unsigned and 
signed administrations differ by 10. To answer the question, the 
sign test would be applied to the signs of the differences 37 — 
(93 + 10), 44 — (34 + 10), . . . , 29 — (13 + 10). By trying out 
other constants, say 2, 4, and 6, a range of admissible differences 
could be roughly established. 

The sign test of the significance of differences between paired 
data is easily applied. When data are not truly quantitative, as 
may be the case when rating scales or judgments are used, the 
sign test may be the only test which can logically be applied. When 
the differences are dependable, in the sense that competent observers 
would agree upon their existence and direction, and are independent, 
the sign test is appropriate. 
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When data are truly quantitative, the sign test is useful primarily 
in rough or preliminary analysis. If the paired members of an 
experimental and a control group are measured on a great many 
variables, as is often the case, the sign test may be helpful in select- 
ing a few of the sets of data for more extensive and rigorous analysis. 


24. 


25. 


26. 


21. 


28. 


29. 


Exercises 


The proportion of individuals having attribute B in the population of 
400, Table ПІ, Appendix B, is .25. Take a great many random samples 
of 10, determine the relative frequencies of 10, 9, . . . , 1, 0 B's, and 
compare with the expected relative frequencies shown below. The 
latter are obtained from the terms of the binomial (1/4 + 3/4)!*. 


NUMBER ОЕ B's EXPECTED RELATIVE FREQUENCY 
10, 9, or 8 .001 

.003 

. 016 

.058 

.146 

.250 

.282 

.188 

.056 


© юы юж оз: 


Find the mean, standard deviation, and alpha statistics for each of 
the following binomials: (1/3 + 2/3)", (1 + .9)%, (3/4 + 1/4), 
(.05 + .95)%5, Which do not permit normal curve approximation? 

Ш a sample of 10 from a twofold population, 7 individuals haying the 
given attribute were observed. How sound is the hypothesis that 
р < 1/5) Since the binomial (1/5 + 4/5) cannot be approximated 
by the normal curve, the probability of a sample haying 7 or more 
individuals with the given attribute must be determined by summing 
the first four terms of the binomial. 

Compare the exact values of the terms of the binomial (1/2 + [/2)10 
with the values obtained by normal curve approximations. 

An experimenter reported that, in tossing a coin 400 times, he had 
observed 250 heads. Comment. 

Eleven children in a sample of 100 were found to be left-handed. Is this 
consistent with the hypothesis that 6 per cent of the children in the 
population are left-handed? 1з it consistent with the hypothesis that 
not more than 6 per cent are left-handed? 
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30. The publisher of a children’s encyclopedia made the claim that 90 per 
cent or more of the words in the encyclopedia were on a standard word 
list, An elementary supervisor examined a random sample of 4 pages 
containing 1,200 words and found 200 words which were not on the 
word list. Does this finding throw doubt on the publisher's claim? 

31. (Kinney) А sociologist who is interested in the characteristics of a 
certain race which we will call R, hit on the idea of trying to sort R’s 
from non-R’s in the writings of unknown persons. Accordingly he 
persuaded a colleague to let him Вауе 64 examination papers, with 
names removed, from the psychology class at Blank University. On 
43 of these papers he correctly spotted the students as R’s or non-R's. 
In 21 cases he missed. Find the probability of this performance haying 
resulted from pure chance. 


The Normal Sampling Distribution 


When a population is normal in form, a great many sample sta- 
tistics are distributed normally or nearly so; in fact, as sample size 
increases the sampling distributions of most of the commonly used 
statistics approach normality. 

When it can be assumed that the sampling distribution of a 
statistic is normal and when the standard deviation of the distribu- 
tion can be determined, inferences about the corresponding popu- 
lation parameter can readily be made by use of a table of normal 
areas. The area under the curve corresponding to a specified interval 
on the base line indicates the relative frequency or probability of 
sample values falling in the interval, and hence provides a con- 
venient criterion for testing hypotheses and estimating confidence 
intervals. 

We shall find little that is new in this application of the normal 
curve. In estimating confidence intervals of predicted and true 
scores and in making inferences: about population proportions, we 
used normal curve relationships in exactly this way. The applica- 
tion may readily be extended to the mean or any other statistic 
whose sampling distribution approximates normality. 

The Sampling Distribution of the Mean. It can be shown 
analytically that the means of all possible samples of size N from 
a normal population are distributed normally, and that the mean 
of the distribution (the mean of means) is equal to the mean of the 


414 Statistics т Education 


population and the standard deviation equal to the standard devi- 
ation of the population divided by the square root of JV. 

The demonstration is beyond the scope of this book, but we can 
check and clarify the statements experimentally. The distribution 
of the means of 120 samples of 25 each are shown in Table 8.2. 


TABLE 8.2 
MEAN VALUES OF 120 SAMPLES 
OF 25 FROM A NORMAL POPU- 
LATION IN WHICH M = 40.0 AND 


б = 10.0 
MEAN OF SAMPLE FREQUENCY 
45.5- 1 
44.5- 0 
43.5- 3 
42.5- T 
41.5- 14 
40.5- 22 
39.5- 25 
38.5 18 
37.5- 16 
36.5- 8 
35.5- 3 
34.5- 3 


The samples were drawn at random from the normal population 
of 400 scores, Table Ш, Appendix B, in which М = 40.0 and 
$ = 10.0. The distribution of means approximates normality 
closely (see Fig. 8.8) with mean of 39.8 and standard deviation, 
corrected for grouping, of 2.02. These values are in good agreement 
with the theoretically correct values 40.0 and 10.0/+/25 or 2.00, 
respectively; in fact, the agreement is unusually good, considering 
that our experimental distribution comprises only 120 of the many 
possible samples of 25 each from this population of 400. 

Many such experiments have been made and they support the 
theoretically exact conclusion stated above, namely, that when a 
population is normal in form with mean M and standard deviation 
$, the sampling distribution of the mean of samples of size N is 
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normal with mean М and standard deviation ¢/+/N. The latter 
quantity is generally known as the standard error of the mean (see 
р. 420) and is designated ом or ох. Using this notation we have 


ом = 6/ММ. (8.4) 


The relationship expressed in (8.4) obtains exactly when the 
population is normal with known standard deviation д. It is exact 


25 


20 


Frequency 
a 


о 


€ 


9 34.95 36.95 38.95 40.95 4295 44.95 


Mean of sample 


Fig. 8.8. Histogram of distribution of 120 sample 
means and superimposed normal curve. (From 


Table 8.2.) 


enough to be useful in inference, even though the population shows 
considerable departure from normality (see p. 447) and ¢ is un- 
known, provided the sample is not very small, say not less than 
about 30. When JN is about 30 or more, the standard deviation т 
of the sample may be substituted for 4 in formula (8.4) without 
introducing serious error. Making the substitution, we have 


см = в/\/М (approximately), (8.5) 


which gives the standard error of the mean in terms of sample 
statistics. 

The approximation in (8.5) is somewhat better if Л is replaced 
by N — 1. The standard deviation of the sample tends to under- 
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estimate the standard deviation of the population. A better esti- 
mate of 2 results when the standard deviation с of the sample is 
multiplied by 4//N/(N — 1). This correction for underestimation, 
however, does not overcome a more serious limitation. When N 
is so small that the correction makes a material difference, the 
normal probability scale should not be used in making inferences, 
as will be brought out in connection with the / sampling distribution. 
For this reason the use of N — 1 instead of N in (8.5) is not recom- 
mended here. 

Since the standard error of the mean can be approximated from 
sample statistics and since the sampling distribution of the mean 
tends to be normal in form, it follows that normal curve relation- 
ships can be used in making inferences about a population mean, 
provided the sample is not less than about 30 in size. 

Testing Hypotheses about Population Means. The general 
procedures in testing hypotheses and controlling risks of the Type I 
and Type II inferential mistakes are readily applied to the sampling 


Region of 
acceptance 


Region of 
acceptance 


Region of 
acceptance 


-1.64 ом М -196g, М 21966, М 11.6404 
j (а) (b) (c) 


Fig. 8.9. Three .05 regions of rejection in the sampling 
distribution of the mean. 


distribution of the mean. The three sets of .05 regions of rejection 
in Fig. 8.2 are, forthe sampling distribution of the mean, determined 
by the points at —1.64ом, + 1.9бем, and --1.64ey, as shown in 
Fig. 8.9. The .01 regions would be determined by the points at 
—2.33см, +2.58см, апа --2.33о м, respectively. The points deter- 
mining other sets of regions of rejection may readily be determined 
from the table of normal areas. The reasons for selecting a region 
of rejection of particular size and location were discussed in pages 
385-388. 

To test an hypothesis about the mean М of a population we need 
only to determine whether the difference between the mean of a 
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sample and the mean proposed by the hypothesis is, in standard 

units, sufficiently large to fall in the particular region of rejection 

selected. If we use CR* to designate a difference or deviation from 

the mean of a normal sampling distribution expressed in standard 

units, we have 

= М-М Bs М-М (8.6) 
см с/у 


in which M is the mean of the sample, M is the proposed or hypothe- 
sized mean of the population, and ом is the standard error of the 
mean. This is the crucial statistic for testing an hypothesis about 
the mean. 

If CR is — 1.64 or less, the hypothesis that Mis equal to or greater 
than the value proposed can be rejected at the 5 per cent level of 
significance; if CR is —2.33 or less, the hypothesis can be rejected 
at the 1 per cent level. The normal curve areas to the left of CR 
gives the probability P of a sample deviation as great as the one 
observed, if the hypothesis is true, and hence indicates the level 
of significance at which the hypothesis can be rejected. 

If the absolute value of CR equals or exceeds 1.96, the hypothesis 
that M is equal to the value proposed can be rejected at the 5 per 
cent level; if the absolute value of CR equals or exceeds 2.58, the 
hypothesis can be rejected at the 1 per cent level. To obtain the 
probability P of an absolute deviation as great as the one observed, 
the normal curve area lying beyond CR is doubled. The value of Р 
indicates the level of significance at which the hypothesis can be 
rejected. 


CR 


* The symbol CR (critical ratio) is widely used to designate a difference or 
deviation from the expected value or mean of a normal sampling distribution 
expressed in standard units, i.e., 

Difference between observed and expected value of a statistic | 
Standard deviation of the sampling distribution of the statistic 


СВ = 


The symbol is not particularly appropriate. The ratio is not critical, in the usual 
sense of the word. Since the ratio is merely a standard score in a normal sam- 
pling distribution, the symbol 2 ог z' would be more appropriate. However, 
the CR notation has the advantage of single meaning and wide use. 

It is customary to carry computations of CR's to hundredths, whether or 
not the accuracy is warranted. This practice permits ready reference of а CR 
to a table of normal areas, such as Table A. 
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If CR is +1.64 or more, the hypothesis that M is equal to or 
less than the value proposed can be rejected at the 5 per cent level; 
if GR is +2.33 or more, the hypothesis can be rejected at the 1 per 
cent level. The normal curve area lying to the right of CR gives 
the probability P of a deviation as great as the one observed and 
indicates the level of significance at which the hypothesis can be 
rejected. 

The procedures in testing hypotheses about population means 
are illustrated in the following examples: 


EXAMPLE 1. The mean working week of a sample of 250 Penn- 
sylvania high school teachers is 44 hours and the standard 
deviation is 5 hours. Does this evidence contradict the view 
that the mean in the population may in fact be as low as 
40 hours? 

Assuming that the population mean is 40, the mean of the 
sampling distribution is 40 with standard error 5/+/250. The 
hypothesis to be tested is M < 40. We can reject the hypothe- 
sis at the 5 per cent level if the difference between sample 
mean and hypothesized mean is, in standard units, 4-1.64. 
(See [c], Fig. 8.9.) We compute the critical ratio 


244.749; 5 
5/4/250 


Since this CR is much larger than the value +1.64 needed, we 
reject the hypothesis. In fact the probability P corresponding 
to a CR of 12.6 is microscopically small. (Why?) We can be 
very sure, assuming reliable data and sound sampling pro- 
cedures, that the mean working week of Pennsylvania high 
school teachers is not as low as 40 hours. 

EXAMPLE 2. The mean of the 138 VAT scores, Table II, 
Appendix B, is 550.37 and the standard deviation is 92.73. 
Is it reasonable to suppose that the population represented 
by this sample of 138 has a mean of 560? 

The hypothesis to be tested is М = 560. If the hypothesis 
is true, the mean of the sampling distribution is 560 with 
standard error 92.73/4//138. Dividing the difference between 
the sample mean and the hypothesized population mean by 
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the standard error, we have the critical ratio 


CR = 550.37 — 560 _ 
92.73/4/ 138 


To reject the hypothesis М = 560 at the 5 per cent level of 
significance, we need a |CR| equal to or greater than 1.96. 
(See [b], Fig. 8.9.) Since the present CR is —1.22, we accept 
the hypothesis. The corresponding P in this case is .22. 


1:22; 


Estimation of a Population Mean. The maximum likelihood 
or “best” estimate of a population mean М that can be made from 
a single sample is the mean M of the sample. It is usually desirable, 


M, 1.9604 M M 1968, Му 
Fig. 8.10. The lower and upper limits of the 95 per cent confidence 
interval of the population mean are 1.9604 below and above the sample 
mean. 


however, to know the extent to which the point estimate may be 
in error, and hence to determine the interval which has a specified 
probability (usually .95 or .99) of covering the value of the popu- 
lation mean. 

То determine the symmetrical 95 per cent confidence interval we 
proceed as follows. We locate the lower limit М. of the interval 
such that the probability of a sample mean equal to or greater than 
the observed sample mean M is .025. By the table of normal areas, 
this limit is 1.9бом, i-e., .025 of the area under the curve lies beyond 
1.96 standard units. Hence the lower 95 per cent confidence limit 
M, of the population mean is M - 1.9бом. (See Fig. 8.10.) 

This is the same result as would be obtained if M were considered 
to be the mean of the sampling distribution and M; located at a 
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distance —1.9бтм below M. But if this were done, the reasoning 
would be incorrect. If М is in fact equal to Mz, M; is the mean of 
the sampling distribution, as indicated in the figure. The hypothe- 
sized or assumed value of M must always be thought of as the mean 
of the sampling distribution. Otherwise the implication would be 
that M is a variable, which would of course be nonsense. The popula- 
tion mean can have one and only one value. 

By similar procedure and reasoning the upper limit My of the 
population mean is located at М + 1.960%. We may express the 
procedure in the formulas 


М; = М — 1.9бом, (8.7) 
My = М + 1.960%, S 


which give the 95 per cent limits of М in terms of the observed 
sample mean M. Since the interval M + 1.96о м has a .95 probability 
of covering the value of М, it is the 95 per cent confidence interval 
of М. 
EXAMPLE. The mean of the 138 VAT scores, Table II, Ap- 
pendix B, is 550.37 and the standard deviation is 92.73. De- 
termine the 95 per cent confidence interval of the population 
mean. 
The sample mean M is 550.37 and ом is 92.73/4/138 or 
7.89. Substituting these values in formulas (8.7) we obtain 
М. = 534.91 and My = 565.83. The 95 per cent confidence 
interval of the population mean М thus is 534.91-565.83. 
Any hypothesis proposing a value of М outside this interval 
can be rejected at the 5 per cent level. 


The 90 per cent confidence limits of М are located at М + 1.64c м; 
the 99 per cent confidence limits at М + 2.58тм; and so on, as 
readily deduced from a table of normal areas. 

Standard Error; Probable Error. The standard deviation of 
the sampling distribution of a statistic is universally known as the 
standard error of the statistic. А standard error is conventionally 
symbolized either as SE or с with appropriate subscript. Thus, 
SE, см, and ох refer to the standard error of a mean; SE, and op 
refer to the standard error of a proportion; and so on. The square 
of the standard error of a statistic is known as the variance error 
of the statistic; thus, с> is the variance error of a mean. 
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The standard errors of various commonly used statistics are listed 
in Table 8.4. These standard errors are merely the standard devia- 
tions of the sampling distributions of the various statistics. As we 
shall see later, they are used in making inferences about population 
parameters in exactly the same way that the standard error of the 
mean is used. 

In the past the term probable error, commonly abbreviated to PE, 
has been widely used in educational work. It is usually defined as 
the deviation in the sampling distribution which has a probability 
of .5, and, as such, is the deviation which when laid off below and 
above the expected value or mean of the sampling distribution, 
establishes the interval comprising 50 per cent of the sample values. 
It is thus equivalent to the quartile deviation and is numerically 
equal to .6745 times the standard error. 

For several reasons, the use of "probable error" in place of 
"standard error" might well be abandoned. First, the term is mis- 
leading, because it is not the most probable error. By the normal 
law an error or deviation equal to, say, 1/2 PE obviously is more 
probable than one equal to 1 PE. Second, it is rarely the case that 
the 50, rather than the 95 or 99, per cent confidence interval of a 
statistic is desirable. Third, in computing the probable error, the 
standard error ordinarily is first computed and then multiplied by 
.6145. This is wasted work, since a discrepancy between an observed 
and an expected value expressed in PE units can be referred to а 
table of normal areas no more readily than the discrepancy expressed 
in SE units. 

Inferences about Population Proportions. As demonstrated 
in the preceding section, in sampling from a twofold population the 
proportions of individuals having a given attribute in successive 
samples follow the binomial sampling distribution. The use of the 
normal curve to approximate the binomial distribution, however, 
amounts to treating the sampling distribution of the proportion as 
though it were in fact continuous and normal, with mean equal to 
the population proportion f and standard error equal to V/pa/N- 

If the correction for discontinuity of the binomial, ordinarily 
negligible in practice, is ignored, formula (8.2’) becomes 


ORBE. (8.8) 
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in which p is the sample proportion and f is the hypothesized popula- 
tion proportion. It will be noted that formula (8.8) parallels (8.6), 
indicating that inferences about population proportions may be 
made in the same manner as inferences about population means. 


EXAMPLE. In a random sample of 36 taxpayers, 27 or .75 are in 
favor of a proposed bond issue. Can the view that the tax- 
payers are evenly divided on the proposal be rejected at the 
1 per cent level? 

The hypothesis to be tested is р = .50. Since p = .75 and 
М = 36, substitution in formula (8.8) results in 


ES 
4/С50)(50)/36 | 52) 


The value of CR exceeds 2.58, and the hypothesis р = .50 
can be rejected at the 1 per cent level of significance. To obtain 
a probability figure, we enter the table of normal areas at 3.00 
and double the area lying beyond this point. This gives 
P = .003. 


CR = 


To estimate confidence intervals of f, we would follow exactly 
the procedure outlined on page 409, since there we treated the 
sampling distribution of the proportion as though it were in fact 
continuous, normal, and independent of f. 

Inferences about Population Product-Moment Coeffi- 
cients of Correlation. In sampling from a normal bivariate popu- 
lation, the form of the sampling distribution of the product-moment 
coefficient of correlation is independent neither of the population 
parameter f, nor the size № of the sample. 

If the absolute value of £f. is large or if N is small, the distribution 
is markedly skewed, but the skewness decreases steadily as f, de- 
creases or as N increases. When |f,,| is about .50 or less and № is 
about 60 or more, the sampling distribution tends toward normality 
with mean approximately equal to fs, and standard deviation of 
approximately (1 — £2)/4/NN. If the sample value is substituted 
for fz, as is ordinarily done, this standard error is 


1-м, 


Cim = AN 


(8.9) 
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Under the conditions stated above, this standard error and the 
normal probability scale may be used in making inferences about 
fa, The procedure is not the best now available, however, and 
therefore cannot be recommended. 

Fisher (Ref. 4) has shown that a simple logarithmic transforma- 
tion of the product-moment coefficient of correlation is distributed 


32 


0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 
Value of rxy 


Fig. 8.11. Distribution of гу іп 92 random sam- 
ples. (From Table 8.3.) 


normally, to a very close approximation, regardless of the size of the 
sample or the value of £;,. Fisher's transformation, which we shall 
designate the z, transformation is 

esis lope (8.10) 


0 
01 = г 


in which rz, is the sample coefficient. The mean of the 2, sampling 
distribution is the population parameter 2, and the standard devia- 
tion is 
1 
зар. 8.11) 
бет NS ‹ 
Аз а check and clarification of theory, 92 random samples of 25 
each were taken from a normal bivariate population in which fy 


was .85. The distribution of the 92 sample r;'s is shown in the left- 
hand half of Table 8.3 and in Fig. 8.11. It will be noted that the 
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TABLE 8.3 
DISTRIBUTION OF г., AND =, EQUIVALENTS IN 
92 SAMPLES OF 25 EACH FROM A POPULATION 
IN WHICH Ғ., = .85 


VALUE OF Гау FREQUENCY VALUE OF 2, FREQUENCY 
.93-.97 7 1.90-1.99 1 
.88-.92 22 1.80-1.89 2 
.83-.87 32 1.70-1.79 1 
.78-.82 16 1.60-1.69 3 
rent 8 1.50-1.59 1 
.68-.72 3 1.40-1.49 9 
.63-.67 1 1.30-1.39 15 
.58-.62 2 1.20-1.29 20 
,53-.57 0 1.10-1.19 13 
.48-.52 1 1.00-1.09 1 

.90- .99 4 
.80- .89 2 
-70- .79 3 
.60- .69 0 
.50- .59 1 


distribution is markedly skewed to the left. This would be expected. 
Recalling that the limits of the correlation coefficient are —1 and 
+1, we see that the sample coefficients cannot exceed the popula- 
tion value .85 by more than .15, but can fall below .85 by as much 
as 1.85. This situation of course invites skewness. 

The 92 sample r.,'s were transformed to z, equivalents by use of 
Table C, Appendix C. (If such a table were not available, the trans- 
formation could be made by use of formula [8.10].) The distribution 
of the 92 z, equivalents is shown in the right-hand half of Table 
8.3 and in Fig. 8.12. The mean of the distribution of z, is 1.26, in 
agreement to three-figure accuracy with the population value, and 
the standard deviation is .25, in fair agreement with the value .21 
obtained from formula (8.11). 

Regardless of size of sample and the value of zy, the sampling dis- 
tribution of z, is very nearly normal in form, with mean 2, and stand- 
ard deviation 1/4/N — 3. Hence, we may confidently utilize 
the normal probability scale in making inferences from a product- 
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moment correlation coefficient, provided we transform observed 
and hypothesized values to their z, equivalents. The procedure 
parallels that described above in connection with the mean, the 


20 


Frequency 
© = 


іл 


0 0.545 0.745 0.945 1.145 1.345 1.545 1.745 1.945 
Value of z, 


Fig. 8.12. Distribution of z, in 92 random samples. 
(From Table 8.3.) 


three commonly used 5 per cent regions of rejection being deter- 
mined by the points at — 1.64e.,, +1.960:,, and +1.64о,,. (Cf. Fig. 
8.9.) 


EXAMPLE 1. In a sample of 5, speed of response and accuracy 
of response in arithmetic reasoning are correlated with rz, = 
—.65. Does this demonstrate negative relationship between 
speed and accuracy in arithmetic reasoning in the population, 
i.e., is the hypothesis f;, > 0 unsound? 

In accordance with the discussion above, we proceed as 


follows: 
Tr = —.65 2, = —.78 (from Table С) 
hy = 0 2, =0 
1 
pe TRAL 
Е 
СВ = ai = 32510 


Since СВ falls short of the 5 per cent region of rejection, 
the hypothesis cannot be rejected. The probability figure for 


426 


Statistics т Education 


the hypothesis f,, > 0 is given by the area to the left of — 1.10. 
This is .14. 
EXAMPLE 2. The coefficient of correlation between the VAT 
scores and first semester average grades in the sample of 138 
college freshmen, Table II, Appendix B, is .46. Is the hy- 
pothesis tenable that, in the population represented by this 
sample, f,, = .60? 

The z, equivalents of observed and hypothesized coefficients 


аге .50 and .69, respectively; and c+, is 1/4/138 — 3 or .086. 
50 — .69 


Hence, CR = — 086 9% disregarding sign, 2.21. Since CR 
falls in the 5 per cent region of rejection, the hypothesis f, = 
:60 can be rejected at the 5 per cent level. Doubling the area 
lying beyond 2.21 under the normal curve gives the probability 
figure P = .027. 

EXAMPLE З. What are the 95 per cent confidence limits of fry 
in example (2) above? 

Since the sample z, is .50 with о, = .086, we obtain, paral- 
leling the procedure stated in formulas (8.7), .50 — 1.96(.086) 
or .33 as the lower limit of 2, and .50 + 1.96(.086) or .67 as the 
upper limit. Turning these back to r,,’s, we have .32 and .58, 
respectively, as the lower and upper 95 per cent confidence 
limits of f. 


The г, transformation has another important application. When 
several sample r.,’s from a given population or comparable popu- 
lations are available, it may be desirable to combine the г.в and 
thus to obtain an average coefficient, The best method of doing 
this is to compute the weighted mean of the z, equivalents of the 
Гау 8, each being weighted by a number 3 less than the number in 
the corresponding sample. 


EXAMPLE. Over the past 3 years in the college of the freshmen 
of Table IT, Appendix B, the correlations between VAT and 
first semester average grades were .54 in a class of 62, .59 in 
a class of 97, and .46 in a class of 138. What is the average of 
these three coefficients? 

The 2, equivalent of .54 is .60, that of .59 is .68, and that 
of .46 is .50. Since the weight for each of these 2, equivalents 
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is 3 less than the number in the sample, we have 
59(.60) + 94(.68) + 135(.50) | 58 
59 + 94 + 135 ДА 


The rz, equivalent of this mean z, is .52; hence the average 
of the coefficients is .52. 


т 


Тһе 2, transformation is applicable to partial correlation coeffi- 
cients, since these are essentially product-moment coefficients. 
The standard errors of the 2,/8 corresponding to г1з.з and riz.s4 are 
given in Table 8.4. Using these standard errors, we may make 
inferences regarding population parameters fiz.s and #12.з4 in exactly 
the same way that we make inferences regarding an Ёш. 

Before leaving the z, transformation, let us emphasize the fact 
that it is applicable only to the product-moment coefficient of corre- 
lation for continuous data. It cannot be applied to the various 
other coefficients described in Chapter VI, and hence these coeffi- 
cients do not permit exact inferences. This fact marks another 
advantage rz, has over other measures of relationship. 

Standard Errors of Commonly Used Statistics. The standard 
errors of most of the statistics discussed in preceding chapters are 
included in Table 8.4. 

The procedure in using these standard errors to test hypotheses 
and to set up confidence intervals of population parameters is 


TABLE 8.4 


STANDARD ERRORS* OF VARIOUS STATISTICS WHOSE 
SAMPLING DISTRIBUTIONS APPROXIMATE NORMALITY 
ООО ЕВЕ а= 


STATISTIC STANDARD ERROR 
Arithmetic mean, М вм = =. 
VN 
Median, Mdn “ма = vs 
Quartiles, Q, and 0; то, = 90, = N 


с of log values 
Logarithm of geometric mean, log СМ оом = VAN 


TABLE 8.4 (Continued) 


---------------------------------------- 


STATISTIC STANDARD ERROR 


Standard deviation, с 

Average or mean deviation, AD 
Decile deviation, D (= Psy — Ро) 
Quartile deviation, Q 

Coefficient of variation, CV 

z transformation of г.у, Zr 

z transformation of Tiras Trea 

z transformation of 2.34, 25.34 
Biserial correlation coefficient, гь 


Regression coefficient, Б, 


n 2/59 Bis 
Beta coefficients, 612.3 and 812.2 98,55 = 98: = WA = rb) 
А bg В} эз, 
Beta coefficient, 812,35 Thiru = Yos a — RO 


(То get 73,,,, replace Азза 
with Аз.245 to get 0514.23 
replace В». 34 with R4.23) 


* These standard errors are approximate and none, except those of the z, 
transformations, can be relied upon when N is less than about 30. (See 
p. 430.) The standard errors of the multiple coefficient of correlation, the 
correlation coefficients, ra, rj», гу, and ге, and the correlation ratio are not in- 
cluded. Inferences from these statistics can rarely be satisfactorily made by 
use of the normal probability scale and require methods to be described later. 
428 
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similar to that described in connection with the mean. A few exam- 
ples will serve further to illustrate the procedure. 


EXAMPLE 1. What are the 95 per cent confidence limits of the 
geometric mean of the distribution of response times of 
student 1, Table 3.8, p. 106? 

Since the standard error of GM is expressed in logarithms, 
we must work with the log measures in columns 5 and 6 of 
the table. The mean of these measures is about 1.06 and the 
standard deviation .30. Hence, cise = .30/4/55 = .041. Тһе 
95 per cent confidence limits of log GM are 1.06 + 1.96(.041) 
or .98 and 1.14. Turning these logs into numbers, we have 9.6 
and 13.8, approximately, as the 95 per cent confidence limits 
of GM 

Note that in this case the population must be thought of as 

the response times of student 1 to an indefinitely large number 
of tasks similar to the 55 tasks performed. The analysis tells 
us that the interval 9.6-13.8 has a .95 probability of covering 
the @ of this population, and thus is the 95 per cent confi- 
dence interval of the “true” speed of response of student 1 
to such tasks. 
EXAMPLE 2. In a three-variable correlation problem the fol- 
lowing statistics were observed in a sample of 46: 8155 = .31, 
Ваз = .40, гоз = .80, and Вуз = .65. Are these beta coeffi- 
cient significant? 

The hypotheses to be tested are Biss = 0 and Bi; = 0. By 
the formula given in Table 8.4, the standard error of both 


betas is 4 JE Wels or .19. Hence, we have the critical 
[1 80)?][46] A 


ratios 
1—0 
СВ = pa 1.63, 
40-0 
СВ = pem 2.11. 


Тһе CR's indicate that 812.3 is not significant, but that 815.» is 
significant at the 5 per cent level. The P’s corresponding to 
the CR's are about .11 and .03, respectively. 
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Note that the standard error attaching to the coefficients 
is .19. This indicates that the estimates of the population 
belas provided by the sample may be in substantial error. In 
such situations, even if both bela coefficients are significant, 
the regression equation tends to be quite unstable, and hence 
should be used cautiously in prediction. Note also that for 
given values of Ris; and N, the standard error of the belas 
decreases as rs; decreases. Other things being equal, the less 
the correlation between predictor variables, the greater the 
reliability of the belas. 


For two reasons the use of the normal probability scale in making 
inferences from any of the statistics, except the z, transformations, 
in Table 8.4 must be considered as giving only approximate re- 
sults. First, the standard errors of the statistics are stated in 
terms of sample statistics, rather than the unknown population 
parameters which are assumed in their derivation. Consequently 
the standard errors as stated are themselves inexact. Second, the 
sampling distributions of the statistics are only approximately 
normal in form. As sample size increases, the precision of the 
standard errors increases and the sampling distributions approxi- 
mate normality more closely. When sample size is, say, 60 or 
more, the use of the normal probability scale ordinarily gives 
results which compare favorably with those obtained by more 
exact methods. When sample size is, say, 30 or less, the normal 
probability scale is not appropriate. 

The “60-ог-тоге” and '"30-or-less" statements are only rough 
guides. Approximations of varying degrees of exactness do not, 
of course, suddenly become acceptable or inacceptable. More- 
over, the standard errors and sampling distributions of different 
statistics are affected differently by changing sample size. Then, 
too, quite inexact inferences may be satisfactory in quick and 
rough analysis. The only general statement that can be made is 
that the results obtained by use of the normal probability scale 
are generally inexact, although they become steadily better with 
increasing sample size. More exact methods of making inferences 
from most of the statistics listed in Table 8.4, applicable to both 
large and small samples, will be discussed later. 
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Other Uses of the Standard Error. The standard error has 
wide utility in educational research. We have already seen how it 
is used in making inferences about a population parameter. We 
shall later find a similar use for it in making inferences about dif- 
ferences between parameters in two populations. 

The standard error provides a convenient and readily inter- 
preted index of the reliability of a statistic. When a statistic and 
its standard error are reported, the reader is in position to evaluate 
the reliability of the statistic according to his own standards and 
to his own satisfaction. As is shown below, the standard error 
may be used, under certain conditions, to anticipate the size of 
the sample needed to insure a specified reliability. 

Professor Kelley (Ref. 10, p. 222) has recommended that the 
standard error of a sample statistic be used to determine the num- 
ber of places to retain in reporting the statistic. The recommenda- 
tion, which he calls the “one-third sigma rule," is to terminate 
“a published statistic with the decimal place given by the first 
figure of one-third of its standard error." To illustrate the applica- 
tion of the rule, suppose that the computed mean of a sample is 
15.643, the standard deviation is 8.961, ом is 1.12, and о, is .79. 
Since the first figure of one-third of both ом and c, is in the tenths 
place, according to the rule, the mean would be reported as 75.6 
and the standard deviation 9.0. 

The one-third sigma rule is well justified in view of sampling 
fluctuations; in fact, a more stringent rule could be justified, The 
rule applies only to a sample statistic from which inferences about 
the corresponding population parameter are made. It is not appli- 
cable to a statistic used in summarizing and describing a par- 
ticular sample. In the latter case, the usual rules governing com- 
putation with approximate numbers should be followed, Such rules 
have been given in previous chapters. 

Sample Size Needed for a Given Reliability. As has been 
noted, the reliability of a sample statistic as an estimate of the 
corresponding population parameter ordinarily is expressed in 
terms of a confidence interval which has a specified probability 
of covering the value of the parameter. Since the width of the 
confidence interval varies with the standard error, which in turn 
varies with the size of the sample, under certain conditions it is 
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possible to anticipate the size of the sample needed to have a 
specified reliability. We shall illustrate the procedure with refer- 
ence to the mean. 

Suppose that it is desired to determine approximately the size V 
of a sample such that the probability is .95 that the interval 3 units 
below and 3 units above the sample mean will include the popula- 
tion mean, and suppose it is known or can be guessed with fair 
accuracy that the standard deviation 2 in the sampled population 
is about 8. Since the symmetrical 95 per cent confidence limits are 
at +1.960% we have, by substitution in formula (8.6) 


3 
1.96 = ——> 
8/ VN 
so that the value of N, in round numbers, is 27. 

In general, the size of the sample such that the probability is P 
that the interval u units below and above the sample mean will 
include the population mean may be determined approximately 
from the formula 
(СВ)?о? 


u? 


N= › (8.12) 
using the value of CR which corresponds to P. If P is set equal to 
.90, СВ = 1.64; if P is set equal to .99, CR = 2.58; and so on. 

When the data from a pilot or preliminary study are available, 
it is possible to determine the approximate size of the sample 
needed to yield, under similar conditions, a mean M such that the 
probability is P that M is in error by no more than a specified per 
cent of itself as an estimate of the population mean. For example, 
suppose that in a pilot study, М = 75 and о = 12 and suppose it is 
desired to anticipate the sample size N such that the probability 
is .99 that the interval 75 + .02(75) includes the population mean. 
Substituting in formula (8.12) we have 


_ (2.58)?(12)? _ 


N= (02 х 75): — 


426. 


By similar methods we can anticipate the size of the sample 
needed to provide an estimate, of specified precision, of a popula- 
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tion proportion. Analogous to formula (8.12) we may write 


Мек EDI, (8.13) 
ui 
where p and q are estimates of the population proportions and u is 
the absolute deviation, in the proportions unit, to be tolerated. 

When the statistics from a preliminary study are available, or 
when their values can be guessed with fair accuracy, the size of the 
sample needed to impart a given reliability to any of the statistics 
listed in Table 8.4 can be anticipated by procedures essentially 
similar to those described above. 

There is, of course, a great deal of uncertainty in such calcula- 
tions. The statistics from a preliminary study, particularly if the 
sample was small, may be substantially different from those ob- 
served in the final sample. To be on the safe side, it is well to take 
samples somewhat larger than the size calculated. 

Although the calculations of needed sample size are quite un- 
certain, they are generally of considerable value in the initial stages 
of planning a research study. In some situations, it may be found 
that a reduction in sample size is possible, and time and money 
thereby saved; in others, it may be found that samples of size suffi- 
cient to impart a needed reliability to the statistics in question are 
impossible, and wasted work thereby avoided. 

In our discussion we have considered only the problem of antici- 
pating sample size needed to provide confidence intervals of desired 
precision or width. It is possible to anticipate, for any selected risk 
of rejecting a true hypothesis, the size of the sample needed to limit 
the risk of accepting a false hypothesis to some specified amount. 
This problem is somewhat complicated, and we shall not deal with 
it here. The student will find various aspects of the problem dis- 
cussed in Ref. 3. 

Significance of Differences between Means of Two Inde- 
pendent Samples. The most common problem in research, per- 
haps, is to determine whether two samples differ sufficiently in one 
or more characteristics to discredit the hypothesis that the samples 
are from populations similar in the characteristic(s) chosen for 
comparison. If the differences between the samples are too great to 
be reasonably attributed to sampling fluctuations, the null hypothe- 
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sis is rejected and the conclusion follows that real differences exist 
in the populations from which the samples are drawn. 

For example, if the differences between a sample which has under- 
gone an experimental treatment and one which has not are too great 
to be attributed to chance, it follows that the treatment is responsi- 
ble for a real effect which will be observed again if further compara- 
ble samples are so treated. As another example, if the difference in, 
say, problem solving ability between samples from populations of 
boys and girls cannot reasonably be ascribed to chance, the con- 
clusion follows that the populations do in fact differ in problem 
solving ability. Differences which cannot be reasonably ascribed to 
chance are said to be significant. The remarks of Professor Fisher, 
quoted on page 6, well describe the general nature of the question 
of the significance of differences between samples. 

When the samples are measured quantitatively, the difference 
M, — Mz between the sample means is the statistic ordinarily em- 
ployed in the comparison. It can be shown that, if the samples are 
independent and if the sampling distributions of M; and М» are 
normal, the difference M; — М» has a normal sampling distribution 
with mean equal to М, — М, and with standard deviation 


OM,—M, = мо? a бу. (8.14) 


This standard deviation is called the standard error of the difference 
between independent sample means. 

The application of this theory in testing an hypothesis about the 
difference between two population means is similar to that followed 
in testing an hypothesis about a single population mean. The crucial 
statistic is the critical ratio CR which is, in words, 


difference between sample means minus 
Ж hypothesized diff. between population means 


СЕ = standard error of difference between means 


The value of CR needed in order to reject an hypothesis regarding 
the difference М, — М» at a specified level of significance is the 
same as that needed to reject an hypothesis about a single popula- 
tion mean. As a rule, the hypothesis of concern is that М, — М. = 
0, 1.е., that the samples are from populations having equal means. 
In testing this hypothesis, the symmetrical or two-sided region of 
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rejection is appropriate. Consequently, a |CR| equal to or greater 
than 1.96 is needed for rejection at the 5 per cent level. However, 
if the hypothesis is that the difference М, — М, is as much as some 
proposed amount, the one-sided region of rejection at —1.64 or 
- 1.64, depending upon the direction of the difference, is appro- 
priate at the 5 per cent level. 

The theory is illustrated in the following example: 


EXAMPLE. For the 99 public school students of Table II, 
Appendix B, the mean M; and the standard deviation c; of 
semester averages are 75.23 and 7.62 respectively. For the 
A7 private school students, the mean М» and the standard de- 
viation оз are 72.85 and 9.02. Is the difference between means 
significant? 

We shall assume that the sampling distributions of M; and 
М, are normal. The hypothesis to be tested is that M, = М, = 
0. If the hypothesis is true, the sampling distribution of differ- 
ences is normal, with mean of 0 and standard deviation 
Miel, + ci, By formula (8.5) см, = (7.62)7/99 = .59 and 
ci, = (9.02)2/47 = 1.73. Hence, ом, м, = У.59 + 1.13 = 
1.52. This gives us the critical ratio 
(75.23 — 72.85) — 0 

1.52 


Since the value of GR falls short of the ratio 1.96 needed for 
rejection of the hypothesis at the 5 per cent level, the hypothe- 
sis is tenable. No real difference between the means of semester 
averages in the populations represented by the samples is 
demonstrated by these data. 


CR = 1.57. 


In estimating confidence limits of а difference between population 
means, the procedure is similar to that followed in estimating the 
confidence limits of any parameter. The 95 per cent confidence 
limits are given by (М, — М») + 1.96cm,-u,; the 99 per cent con- 
fidence limits by (М, — М») + 2.580м, м; and so on. In the ex- 
ample above, the 95 per cent confidence limits of the true difference 
М, — М, is (75.23 — 72.85) + 1.96(1.52) or about —.60 and 5.36. 
Any hypothesis proposing a value of М, — М» falling beyond these 
limits would be rejected at the 5 per cent or better level. 
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Confidence limits of the differences between other population 
parameters, such as proportions and various other parameters 
whose differences are discussed in the following pages, may be deter- 
mined by procedures analogous to the above. In general, such con- 
fidence limits serve to separate hypotheses which are admissible 
from those which are not. 

Significance of Differences between the Means of Two Cor- 
related Samples. It is frequently the case that the scores in one 
sample are paired with the scores in the other. This situation arises 
in single group experiments, in which the same individuals are 
measured before and. afler ап experimental treatment or are meas- 
ured at the end of one experimental treatment and measured again 
at the end of a second. It arises in parallel group experiments, in 
which each individual in the experimental group is matched with a 
comparable individual in the control group and measurements made 
on both groups. In general, the situation exists whenever the scores 
in two samples or the scores observed under two conditions can be 
meaningfully correlated. 

In such situations, the samples are not independent, and the 
standard error of the difference between the means of the two sam- 
ples is 


Omm = Мо, + Фи, — 2Гбмбмь (8.15) 


in which ry is the product-moment coefficient of correlation be- 
‚ tween the paired scores. If гу; is positive, as is usually the case, the 
standard error of the difference between means of correlated sam- 
ples is smaller than that of independent samples, as is seen when 
formulas (8.15) and (8.14) are compared. 

In terms of deviation scores, formula (8.15) becomes 


сомум, = AE + 221 — 2®гулз, (8.15) 


and іп terms of gross or raw scores 


NEX; = СХ + NZX? — СХ, 
1 = ХЗХ, - (ХХ) (25) (8.15) 
OMM, = N N a А. 


If the scores are ungrouped and if the values of су), см,, and г» are 
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of no concern, (8.15") or (8.15') usually is more convenient than 
(8.15). 

It may be shown (Ref. 16, p. 167) that the e error of the 
difference between the means of two correlated samples is equal to 
the standard error of the mean of the differences between the scores 
of the paired individuals. If we use D to represent the differences 
between raw scores, we may write 


NE 3) NSD? = MOD. 
Мұ-Ма LN ^ 


in which ор is the standard deviation of the differences between 
Scores. 

The equivalence of formulas (8.16) and (8.157) is seen in the cal- 
culations at the foot of Table 8.5. The data in the table are the 
scores X; on a superstition test of 28 high school students who had 
three years of science and the scores X; of 28 individuals who had 
one year of science, the individuals being matched on socioeconomic 
status and IQ. 

After the standard error of the difference between the means of 
correlated samples is obtained by one of the above procedures, the 
test of significance of the difference between means is similar to that 
for independent samples. For example, the difference between М, 
and M; of the scores in Table 8.5 is — 1.43. The hypothesis of con- 
cern here is that M, — М. = 0. The standard error of the differ- 
ence is .95, so that CR = —1.43/.95 or 1.51, disregarding sign. This 
falls short of the value 1.96 needed for significance at the 5 per cent 
level, and the hypothesis cannot be rejected. The corresponding 
probability figure, as obtained from a table of normal areas, is about 
.13. The study fails to demonstrate that students having three 
years of high school science and students having one year dilfer 
significantly in superstitious beliefs, as measured by the test. 

Significance of Differences between Independent Sample 
Proportions. Suppose that we have two twofold populations in 
which the proportions of individuals having a given attribute are 
р: and f, respectively. In successive independent samples of sizes 
М, and №, from these populations, the sampling distribution of the 
differences pı — ps between sample proportions is approximately 
normal, provided the binomial distributions of both proportions 


(8.16) 


TABLE 8.5 


COMPUTATION OF STANDARD ERROR OF DIFFERENCE 
BETWEEN MEANS OF TWO CORRELATED SAMPLES 


ғын X, X Xi р = Х, – Х, D: 
1 8 15 64 225 120 —1 49 
2 18 19 324 361 342 -1 1 
3 20 16 400 256 320 4 16 
4 14 17 196 289 238 —-3 9 
5 22 18 494 324 396 4 16 
6 18 18 324 324 324 0 0 
7 23 16 529 256 368 7 49 
8 5 17 25 289 85 =12 144 
9 4 2 16 4 8 2 4 
10 11 19 11 361 209 - 8 64 
п 7 6 49 36 42 1 1 
12 21 25 729 625 75 2 4 
13 16 9 256 8 14 7 49 
14 2 1 4 121 22 - 9 81 
15 4 4 16 16 16 0 0 
16 6 8 36 64 48 —2 4 
ЁТ 1 9 1 81 9 — 8 64 
18 7 1 49 49 49 0 0 
19 9 m 81 49 63 2 4 
20 10 17 100 289 170 — 1 49 
21 9 15 81 2% 135 — 6 36 
22 10 16 100 256 160 — 6 36 
23 13 10 169 100 1:0 3 9 
24 8 13 64 169 104 -5 25 
25 17 14 289 196 238 3 9 
26 12 8 144 64 96 4 16 
27 10 15 100 225 150 - 5 25 
28 10 10 100 100 100 0 

SUM 821 361 4,851 5,435 4,761 0 
MEAN 11.46 12.89 T 


By formula (8.157) By formula (8.16) 
©м-м, = OMM = 

Ji X 4,851 — (321): + 28 X 5,435 —| 1 . [28 X 764 — (—40)i 
1, | (361)? — 2(28 x 4,761 — 321 X 361)|28 2 2) 
28 28 


ааа 
Diog Мов de 


438 


Statistical Inference 439 


are satisfactorily approximated by the normal curve. (See rule-of- 
thumb, p. 404.) The mean of the distribution is f; — р» and the 
standard deviation is 


бр = Vei +, = Bhs + bah, (8.17) 


Аза rule, the hypothesis to be tested is that f; — f» = 0, i.e., 
that the populations are alike in respect to the proportions of indi- 
viduals having the attribute in question. Under this hypothesis, 
р: = fs = р, say. The best estimate of f is the weighted mean p 
of the sample proportions, as obtained from 


= Nip + Мэрз. 
М, + № 


Yi 


Hence, in testing the hypothesis р: — р» = 0, the standard error 
of the difference is taken as 
сър = м + м (8.18) 
where ў = 1 — р. 

Let us apply this theory to the following problem. 


EXAMPLE. In the delinquency data of р. 260, 16 out of 35. 
youth from broken homes and 18 out of 65 youth from un- 
broken homes were found to be delinquent. Is this difference: 
significant? 

To answer the question, we hypothesize that f; — f» = 0. 
We һауе Уі = 35, pı = 16/35 = .46, №, = 65, p = 18/65 = 
.28, so that p — .34. By formula (8.18), 


opp = УЗА X .66/35 + 34 X .66/65 = .099 


Hence, the critical ratio is 


(46 — .28) — 
CR = 2-6. 55 = 1.82. 


The hypothesis cannot be rejected at the 5 рег cent level. The 
corresponding probability figure is .07. The evidence ің іп- 
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sufficient to support the conclusion that in the sampled popula- 
tion the incidence of delinquency is greater in broken than in 
unbroken homes. 


If desired, the percentage rather than the proportions unit may 
be used in problems of this sort. One needs merely to shift decimal 
points two places to the right. 

Significance of Differences between Correlated Sample 
Proportions. There are several situations in which the samples 
from twofold populations are correlated. For instance, the propor- 
tion of individuals who respond in a certain way to a test or ques- 
tionnaire item may be compared with the proportion of the same 
individuals who respond in the same way to the item after some 
experimental treatment or who respond in the same way to a second 
test or questionnaire item. As a second instance, the proportion of 
individuals in a sample having a certain attribute may be compared 
with the proportion in a second sample, the individuals in the two 
samples being matched on some basis. 

For correlated samples, the standard error of the difference ру — 
р» is, as shown by McNemar (Ref. 13), 


тысы = Wop, + 03, — 2 pi po (8.19) 


in which the quantity r, is the fourfold point coefficient of correla- 
tion, 

If we express the standard errors under the radical sign in terms 
of sample proportions and sizes and state г» as in formula (6.87), 
formula (8.19) reduces to 


— — ° 
үтте кышы. (8.20) 


where а апа 4 аге the relative frequencies in the first and fourth 
cells of the 2 X 2-fold correlation table. (See Fig. 6.4.) 

When the hypothesis to be tested is that f; — fs = 0, as is 
usually the case, McNemar shows that formula (8.20) reduces to 


d 
Фру—р, 7 ХЕ + s (8.21) 


After the standard error of the difference between proportions in, 
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correlated samples is obtained, tests of significance are similar to 
those for independent sample proportions. 


EXAMPLE. At the beginning of a course in international rela- 
tions, the students were asked whether they believed war to 
be inevitable. Of the 62 students enrolled, 37 replied “yes” 
and 25 replied “по.” At the end of the course, in response to 
the same question, 22 replied “yes” and 40 “по.” The tabula- 
tion of the two sets of responses is shown below, the values of 
а and d being in parentheses. Does the evidence indicate a 


END OF COURSE 


no yes 


pi = 37/62 = .60 


BEGINNING OF 
COURSE 


significant shift of opinion? 

The hypothesis to be tested is that fı — f» = 0. By formula 
(8.21) the standard error is 4/(.32 + .08)/62 or about .080. 
Since the sample proportions are p; = .60 and р» = .35, we 
have the critical ratio 


СВ = ~~ = 3.1. 


The hypothesis is rejected. The probability figure P согге- 
sponding to a critical ratio of 3.1 is about .002. The evidence 
indicates a highly significant shift of opinion. It may be quite 
confidently expected that further comparable groups of stu- 
dents under similar conditions will show a change of opinion 
concerning war. 
Significance of Differences between Sample Standard De- 
viations. Consider two populations in which the standard devia- 
tions are 4 and 6», respectively, and suppose that we have inde- 
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pendent samples of sizes №; and А with standard deviations т; and 
өз from these populations. If neither N, пог №, is less than about 
60, the distribution of the differences c; — e» is approximately 
normal with mean 61 — $» and standard deviation 


са os Аа (8.22) 


By use of this standard error, an hypothesis regarding the true dif- 
ference $1 — 4» may be tested in the usual manner. However, if the 
hypothesis of concern is that 61 — ó» = 0, i.e., that the popula- 
tions do not differ in variability, as measured by the standard 
deviation, a better test is available in the variance-ratio test, ap- 
plicable to samples of any size. (See рр. 496-497.) 

If the samples are correlated, the standard error of the difference 


01 — 02 is 
Сет = WV 62, 1-03, — 271002, (8.23) 


The variance-ratio test is not applicable in testing hypotheses 
regarding the difference ді — бә, if the samples are correlated. Such 
hypotheses can be tested by referring the critical ratio involving 
the standard error from formula (8.23) to the normal probability 
scale. 


EXAMPLE. Forms Ї and II of an intelligence test were given to 
a sample of 72 students. The standard deviations of the two 
sets of scores were 9.0 and 12.0, respectively, and the scores 
were correlated with л: = .85. From this evidence, is there 
reason to suppose that further samples will show unequal vari- 
abilities on the two forms? 

The hypothesis to be tested is that 61 — д = 0. Since the 
standard error of the difference is, by formula (8.23), 


(939 (20) — ©0450) 
Ш ате ЕЭ шү) б, 
the critical ratio is 
eee 


The difference is more than four times its standard error; 
hence, the hypothesis is very strongly discredited. Further 


| 


Statistical Inference 443 


samples will almost certainly show unequal variabilities on the 
two forms. 


Significance of Differences between Coefficients of Correla- 
tion. In drawing inferences from an observed difference between 
correlation coefficients in two samples, the coefficients are first trans- 
formed to z, equivalents. It can be shown that, in successive inde- 
pendent samples from normal bivariate populations having equal 
product-moment coefficients of correlation, the sampling distribu- 
tion of the differences 2, — 2, is normal with mean of zero and 
standard error 


Ge, = Ма, (8.24) 


To test the hypothesis that 2, - 2, = 0, i.e., that the populations 
are alike in respect to correlation, the difference z,, — Z», is divided 
by its standard error, and the critical ratio is referred to the normal 
probability scale as usual. 


EXAMPLE. In one liberal arts college, the coefficient of correla- 
tion between VAT scores and freshman grades is .60 in a class 
of 125; in a second, the coefficient of correlation is .48 in a 
class of 96. Are these data consistent with the view that VAT 
scores and freshman grades are equally correlated in the 
populations represented by the samples? 

The hypothesis to be tested is that 2,, — 2,, = 0. The equiva- 
lent 2, of .60 is .69, and о. = 1/4/122. The equivalent Zr, 
of .48 is .52, and o;,, = 1/4/93. The standard error of the 
difference is, by formula (8.24), 4/1/122 + 1/93 or about 
.14, so that the critical ratio is 


(69—.52)—0 _ 
Е 


СВ = 

The hypothesis cannot be rejected. The data are not in- 

consistent with the view that the population coefficients are 
equal. 


"The significance of differences between coefficients of correlation 
when the samples are not independent cannot be tested through the 
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z, transformation, because the standard error of differences between 
correlated z,’s is not known. One of the important problems in edu- 
cational measurement involving correlated samples, however, can 
be dealt with by methods to be discussed later (pp. 467—468). 

Significance of Differences between Other Independent 
Sample Statistics. The general ideas underlying tests of signifi- 
cance of differences between independent sample statistics, dis- 
cussed in preceding pages, are applicable to differences between 
various other independent sample statistics, such as medians, 
quartile deviations, and coefficients of variation. 

If the sampling distributions of the statistics chosen for compari- 
son are normal, the sampling distribution of the difference between 
independent sample statistics is normal, with mean and standard 
error analogous to those of the sampling distributions of differences 
discussed above. (See, for example, exr. 38.) 

The standard errors of the differences between correlated sample 
statistics, excepting the mean, proportion, and standard deviation, 
are not known. А rough test of the significance of the difference 
between the values of a statistic in two correlated samples may be 
made, using the standard error for independent samples. The latter 
error tends to be too large, and its use therefore tends to obscure 
significance. If, however, significance is indicated by the rough test, 
presumably it is safe to conclude that a difference exists between 
the populations represented by the correlated samples. 

Sampling from Finite Populations. In our discussion of the 
normal sampling distribution we assumed an infinite population or, 
at least, a population so large as compared with the sample that the 
probabilities of drawing particular individuals remain practically 
constant during the sampling process. 

When the population is finite, the sampling distribution of the 
mean has a mean equal to that of the population, as in sampling 
from an infinite population, but the standard error is somewhat 
smaller than that stated in formula (8.5). For a sample of size № 
from a population of size T, the standard error of the mean is 
approximately 


(8.25) 
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and the standard error of the proportion 


Zi N 
“= J8 Aes T (8.26) 


It will be seen that the correction factor „ [1 — 2 becomes 


negligible as T becomes large relative to N. 

Sampling from Stratified Populations. It will be recalled 
from Chapter I that the method of stratified random sampling 
consists of drawing at random, from each of several strata, numbers 
of individuals proportional to the population numbers in the strata, 
it being assumed that differences in respect to the variable under 
investigation exist between population strata. When this assump- 
tion is well founded, the stratified random sample is better repre- 
sentative of the population than the simple random sample. 

The standard error of the mean of a stratified random sample is 


_1 = МОМ, My — NM — М)- 
ox = A ме ‚ (8.27) 


where № is the number, M the mean, and c the standard deviation 
in the total sample; and №, №, . . . , № are the numbers and 
М, М, . . . , М, the means in the А strata. It will be seen that, 
unless the various strata means are equal to the mean M of the 
total sample, the standard error given by (8.27) will be less than 
that given by (8.5). 

‘As noted in Chapter I when the population strata differ in vari- 
ability, the simple proportional stratified sample is not the most 
trustworthy. When this is the case, a sample selected so that num- 
bers from the several strata vary with the standard deviations in 
those strata reduces the standard error of the mean to a minimum 
for given sample size. (See Ref. 26.) It should be noted, however, 
that in educational research we rarely have knowledge of variability 
within strata and that consequently we rarely can reduce sampling 
error in this way. 

It is sometimes desired to test the significance of differences 
among means of strata in the stratified sample. This can best be 
done by analysis of variance techniques (see p. 498); but if the dif- 
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ferences among strata are of first concern it would usually be pref- 
erable to take equal rather than proportional numbers from the 


strata. 
In the sampling of attributes, the standard error of the total 
sample proportion p is, when the sample is large, approximately 


АТЫ №. B № (р: — р)? — (8.28) 


№ (рь — р)? 
where N is the number in the total sample, p the proportion having 
the attribute, q = 1 — р; and №, Ns, . . . , № the numbers, and 


Рі, p» + + + 5 рь the proportions in the & strata. 

The'Assumption of Normality. The sampling distribution of 
the difference between two sample statistics is normal, provided the 
sampling distribution of the statistic for each sample is normal. 
This provision can be shown to be met only if the sampled popula- 
tions are normal. Thus, it is logically necessary to validate the as- 
sumption of population normality before using the normal sampling 
distribution, 

When the sample is less than about 60, there are no satisfactory 
ways to determine whether the sample departs sufficiently from 
normality to discredit the assumption that its parent population 
is normal. In this case, the assumption can be examined only in 
light of what is known or believed to be true about the nature of the 
sample measures. For larger samples there are several ways to test 
the assumption, the most useful of which are, perhaps, the x? test 
of “goodness of fit" and tests based upon the alpha measures of 
skewness and kurtosis. The alpha tests are more sensitive than the 
x? test and have the additional advantage of indicating whether 
nonnormality is due to skewness or to nonnormal peakedness or to 
both. The x? test will be discussed in a later section; at this point 
we shall consider the alpha tests. 

Е. S. Pearson has determined the 90 and 98 per cent sampling 
limits of аз and o4 for various size samples, and the limits shown 
in Table G are taken from his work. The table is read as follows: 
In samples of 100 from a normal population, 90 per cent of the 
аз values may be expected to fall between —.39 and 4-.39 and 
90 per cent of the o; values may be expected to fall between 2.35 
and 3.77; and so on. 
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To test the assumption of normality, we need only to compute 
the o; and o4 statistics of the sample and refer to Table С. As an 
example, consider the sample data of Table 4.14 where N = 138, 
аз = .32, and o, = 2.64. Referring these values to Table G, we 
find that both of the alphas fall within the 90 per cent sampling 
limits for samples of this size; hence, the assumption of population 
normality is tenable. 

It is the recommended practice to conclude that the assumption 
of normality is tenable when both sample alphas fall within the 
90 per cent limits; is in doubt when either falls beyond the 90 per 
cent limits; and is untenable when either falls beyond the 98 per 
cent limits. To use Table G in one-sided tests, the upper or lower 
tabled limit is taken as the criterion, and the probability is halved. 
Usually, however, the two-sided test is the more appropriate. 

The student may find that the standard errors of a; and a4 and 
the normal probability scale are occasionally used in testing the: 
assumption of normality. These standard errors are to a first. 
approximation +/6/N and 2 4/6/N, but the sampling distributions 
of the alpha statistics are not normal, that of оц being markedly 
nonnormal, except in very large samples. For this reason, Table G 
should always be used, particularly for the o4 statistic. It will be 
instructive for the student to compare the 90 and 98 per cent 
sampling limits of the alphas obtained by standard error and 
normal curve procedure with those shown in Table G. 

The / and F sampling distributions, as well as the normal, rest 
upon the assumption of population normality. These distributions 
will be discussed in later sections. 

Effects of Nonnormality. Although normal sampling distribu- 
tion theory rests upon population normality, there is a great deal 
of evidence in support of the view that considerable departure 
from normality does not materially affect the sampling distribution 
of the mean. One of the many experiments in sampling from non- 
normal populations was noted on pages 207-209. 'The student can 
verify that the parameters of that severely nonnormal population 
are, to three-figure accuracy, 


М = 504, ¢=493, @=1.75, ё, = 4.90. 
The statistics of the distribution of the 1,000 means of samples of 
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25 each from this population, recorded in Table 5.3, are 
M = 508, см = 101, оз = .33, o4 = 3.06. 


Thus, despite the severe nonnormality of the parent population, 
the experimental sampling distribution approximates normality 
rather closely. It has a mean which differs from the population 
mean by less than 1 per cent, and a standard deviation which 
differs from 98.6, the value of ом obtained from formula (8.4), by 
about 214 per cent. 

Many such experiments confirm the view that, when the size 
of the sample is about 30 or more and the population at least about 
10 times as large as the sample, the use of normal curve relation- 
ships in drawing inferences about the population mean is justified, 
despite considerable departure from normality in the population. 
Unfortunately, such experiments do not yield evidence which can 
be generalized either to other nonnormal populations or to the 
sampling distributions of other statistics. 

As a rule, when a sample of scores indicates definite nonnormality 
in the population, the investigator should attempt to transform 
the scores to some form which permits the assumption of normality. 
One possible normalizing transformation was demonstrated in 
Chapter III, in connection with the geometric mean. Mueller 
(Ref. 15) discusses various other transformations at some length. 

In addition to permitting logical tests of significance and esti- 
mation, normal distributions are more readily described and 
analyzed than nonnormal. However, when appropriate transforma- 
tions cannot be found, the usual tests of significance and estimation 
regarding population means appear to be, on experimental evidence, 
fairly trustworthy, even though the population departs quite con- 
siderably from normality. 

In recent years some progress has been made in the development 
of nonparametric or distribution-free methods of testing hypotheses. 
(See Ref. 3.) The binomial sign test described in the preceding 
section belongs with these methods. The methods require no assump- 
tions of population normality, but unfortunately are characterized 
by relatively low efficiency. Moreover, the methods do not as a 
rule permit satisfactory estimates of parameters. 

Effects of Errors of Measurement on Statistical Inference. 
In preceding discussions of statistical inference, no reference was 
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made to errors of measurement, it being implicitly assumed that 
the obtained measures in the sample were perfectly reliable. In 
applied statistics the assumption is, of course, never satisfied, and 
the effect of errors of measurement may be of real consequence. 

Suppose that we have the obtained measure or score on each of 
N individuals in a sample. As was brought out in Chapter VII, an 
error of measurement attaches to each of these scores, and conse- 
quently their mean may be in error as an estimate of the mean of 
the true scores of the individuals. In other words, if the individuals 
were measured a second time, or repeatedly, the scores and their 
mean would be likely to differ to some extent from those first 
obtained. 

If the errors of measurement are normally distributed and equally 
variable throughout the range of obtained scores, the extent of 
error attaching to a single score is indicated by the standard error 
of measurement о, and the extent of error attaching to the mean 
of the set by c,/4//N. By use of this latter error, it is possible to 
estimate the confidence limits of the mean of true scores from the 
mean of obtained scores. 


EXAMPLE. The mean of the 138 VAT scores, Table II, Ap- 
pendix B, is 550.37 and the standard deviation is 92.73. The 
reliability coefficient of the test was .90. What are the 95 per 
cent confidence limits of the mean of the true scores of this 
sample? 

From the relationship in (7.9) we obtain 


с, = 92.13 4/1 — .90 


or about 29.3. Hence, c,/4//N = 2.50. The 95 per cent confi- 
dence limits of the mean of the true scores are thus 550.37 + 
1.96 x 2.50 or about 545 and 555. 


In this example the 95 per cent confidence limits of the true mean 
are about 1 per cent above and below the obtained mean, and the 
error occasioned by using the obtained mean in inference tends to 
be negligible. 

Unless № is quite small or c; relatively large, the mean of a sample 
of obtained scores is a satisfactory approximation to the mean of 
the true scores and is therefore trustworthy in drawing inferences 
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about the mean of the population from which the sample of indi- 
viduals was taken. 

Tf, however, N is small ог ce large, allowance for the possible error 
in the mean of the sample of obtained scores should be made and 
inferences interpreted accordingly. Obviously, if the decision to 
accept or to reject an hypothesis would be changed if some other 
probable value of the mean of obtained scores (or of the difference 
between two means) were used in the critical ratio formula, the 
decision is highly questionable. (See exr. 45.) 

Errors of measurement and sampling errors, in the usual sense, 
cannot be considered simultaneously. Moreover, the effects of 
errors of measurement upon tests of significance cannot ordinarily 
be determined by supplementary calculations. In the preceding 
chapter, it was noted that the errors tend to inflate the standard 
deviation and to deflate or attenuate the correlation coefficient. 
These facts suggest that the standard errors of the statistics of 
Table 8.4 involving с ог г tend to be somewhat too large and that 
critical ratios having such standard errors in their denominators 
tend to be somewhat too small. However, there are at present no 
satisfactory ways of correcting these tendencies. The best that can 
be done is to employ the most reliable instruments available and to 
be cautious in drawing inferences from measures of low reliability. 

Before leaving this topic let us note that the standard error of the 
difference between two obtained scores is 4/72 0? or о. \/2. In 
the above example, where c, was about 29.3, the difference between 
two obtained VAT scores would have to be about 81 to be sig- 
nificant at the 5 per cent level. (Why?) Such tests of significance 
are important mainly because they re-emphasize the need to in- 
terpret individual obtained scores in light of the standard error of 
measurement. 

Concluding Remarks. The normal sampling distribution, like 
the other sampling distributions discussed later in this chapter, has 
its chief use in research as a criterion of real differences between 
observation and the expectation established by an hypothesis. 

In the sampling situation, there are four plausible explanations of 
an apparent difference: (1) bias in method of observation, (2) vari- 
able errors of measurement, (3) sampling fluctuations, and (4) real 
differences between populations. The explanations are not mutually 
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exclusive. A test of significance is conclusive only if the original 
observations are unbiased and reliable. The test indicates whether 
the difference can reasonably be ascribed to sampling fluctuations, ` 
nothing more. 

The principal assumptions underlying normal sampling distribu- 
tion theory are normality in the sampled population and random- 
ness of sampling. These assumptions frequently are incompletely 
satisfied, and this circumstance adds to the uncertainty of inference. 
It has been argued that, in view of possibly unsatisfied assumptions 
and the quality of evidence typically available in education re- 
search, it would be well to insist upon a stringent level of significance, 
say the 1 per cent level, before concluding a difference is real. As 
a rule, such procedure would seem unwise. Since it is impossible to 
say what effect faulty evidence and unsatisfied assumptions have 
upon tests of significance, the procedure would presumably intro- 
duce as many inferential mistakes as it would prevent. To reduce 
the risk of rejecting a true hypothesis is inevitably to increase the 
risk of accepting a false one. The nature of the problem and the 
consequence of the decision to reject or to accept an hypothesis 
ordinarily suggest how to weigh and compromise the risks. Аз a rule, 
there is no satisfactory way to make allowances for inaccurate data 
and poor sampling procedures. They must be prevented if demon- 
strably sound inferences are to be made. 

The normal sampling distribution is the backbone of so-called 
large sampling theory. It is widely useful, perhaps the most gen- 
erally useful sampling distribution available. When № is small, how- 
ever, it does not permit trustworthy inferences. What is needed is 
a distribution that is independent of unknown population param- 
eters and is applicable to samples of any size. This brings us to the £ 
distribution of the next section. 


Exercises 


32. Given the following data concerning the IQ's of a random sample of 
individuals from a specified population: М = 225, М = 105, в = 15. 


а. If a large number of samples of 225 were taken at random from 
this population, about what would be the standard deviation of the 
distribution of the means of the samples? 
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33. 


34. 


35. 


36. 


37. 
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b. (М = 103, in what per cent of the samples would you expect to 
observe a mean of 105 or more? 

с. At what level of significance would you reject the hypothesis that 
М = 103? 

d. What are the 99 per cent confidence limits of M? 


The mean IQ of the sample of 293 eighth-graders of Table I, Appendix 
B, is 95.00 and the standard deviation is 13.20. What are the 95 per 
cent confidence limits of the population mean? Of the population 
standard deviation? 

Criticize each of the following: 


a. An investigator measured every member of a population. He re- 
ported both mean and см. 

b. In a sample of 8 from a normal population an investigator found a 
mean of 50 and a с of 5. From these data he determined ом and т, 
and estimated the 95 per cent confidence limits of М and 6. 


The question arose in a large coeducational college as to whether the 
men students, as a rule, made better marks than the women. To in- 
vestigate the question, the following data were collected in random 
samples of 81 men and 64 women: Mean grade index for men 3.82 with 
с of .90; mean grade index for women 3.25 with о of 1.04. 


a. What is the standard error of the men's mean? 

b. What is the standard error of the women's mean? 

с. What is the probability that the men and women students in the 
college are equal in achievement? That the men are superior? 

d. What is the probability that the men and women are equally variable 
in achievement? 

e. Can any of the above results be generalized to other colleges? 


In exr. 35, above, if the men in the college total 4,050 and the women 
total 3,200, what correction can be applied to the standard errors of 
the means? 

The data of Table I, Appendix A, may be considered a random sample 
from a population of eighth-grade pupils in a given city. 


a. How may the hypothesis be tested that the mean 10 in the рор- 
ulation is 100? 

b. How may the hypothesis be tested that the ratio of boys to girls in 
the population is 45:55? 

с. If the coefficient, of correlation between mental age and vocabulary 
turns out to be .78, how may the hypothesis be tested that the 
population coefficient is .85 or more? 
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d. How may the assumption of normality of the population distribution 
of 10% be tested? 

е. If the correlation between mental age and reading is computed in 
each of the 10 schools, how may an average of the coefficients best 
be found? 


38. Wechsler reports that the mean and standard deviation in a sample of 
50 8-year-olds on his block design subtest were 4.0 and 1.98, respec- 
tively, as compared to a mean of 10.9 and standard deviation 3.18 in 
a sample of 100 16-year-olds. Does this indicate a significant difference 
in variability? (№. Since the means are so dissimilar the standard 
deviations should not be directly compared. Here, the coefficient of 
variation CV is appropriate. The standard error of CV is given in 
"Table 8.4.) 

39. (Rietz) А group of scientific men reported 1,705 sons and 1,527 daugh- 
ters. Do these data conform to the hypothesis that 1/2 is the probability 
that a child to be born will be a boy? 

40. In an intelligence test, 25 students out of 100 answered item i and 
30 answered item j correctly. Sixty missed both items. How sure can 
you be that in future administrations of the test, individuals from the 
same population will find item i the more difficult? 


. Al. In Table 7.3, p. 360, 11 of the 16 students in the upper half of the group 


and 4 in the lower half responded correctly to item 13. Is it probable 
that this item will discriminate between upper and lower halves in 
further comparable samples? 

42. Test the significance of differences between proportions in exrs. 32 and 


33, pp. 265-266. 
43. The product-moment coefficients of correlation between two variables 


in three samples were: 


SAMPLE 1 N = 147 г = .66 
Е РА М = 67 r= .58 
DNE М = 84 r= .70 


a. Test the difference between each pair of r's for significance. 
b. What is the best way to find the average of the three coefficients? 


44. Two groups of college students, 100 in each group, were matched for 
initial ability on a biology test. One group was taught by a lecture- 
demonstration method, the other by a lecture-laboratory method. At 
the end of the experimental period, the first group had a mean on an 
achievement test of 56 with о of 7, the other group had a mean of 54 
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with of 6. The coefficient of correlation between groups on the achieve- 
ment test was .50. What can be concluded? 

45. In a study of adjustment, an investigator gave a personality inventory 
to 25 “опу” children and to 36 children from large families. For the 
25, the mean and standard deviation were 52 and 10; for the 36, the 
mean and standard deviation were 58 and 12. The investigator divided 
the difference between means by its standard error, obtained a critical 
ratio of 2.12, and concluded that the difference was significant at the 
3 per cent level. The reliability coefficient of the inventory was .50. 
Show by considering the confidence limits of the means of the true 
scores of the samples that the conclusion is poorly supported by the 
data. 

46. In what way is the standard error of measurement c, of an obtained 
score similar to the standard error of a statistic, such as тм. 

47. Show algebraically, or by application to а specific example, that for- 
mulas (8.19) and (8.20) are equivalent. 

48. Show that the critical ratio (pı — p3)/ A/(d-4-a)/N is equal to 
(A — D)/A/A + D, where d, a, А, and D have their usual meaning in 
the fourfold correlation table. 

49. The statistics observed in a sample from a three-strata population 


were: 
STANDARD 
STRATUM NUMBER MEAN DEVIATION 
i 40 10.0 5.0 
2 100 60.0 8.0 
3 60 65.0 920 


а. Verify that M, = 63.5 and e, = 8.7. 

b. Find ом, using formula (8.27). What are the 95 per cent confidence 
limits of the population mean? 

с. Find ом without correction for stratification. What are the 95 per 
cent confidence limits of the population mean, using this т м? 

4. What is the advantage of stratified sampling when possible? 


50. Outline a problem involving the sampling of attributes from a twofold 
‚ population which can be stratified. Discuss the method of sampling 
and the analysis of the data obtained. 


The t Sampling Distribution 


We have seen that the means of random samples from a normal 
population are distributed normally about the population mean М, 
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with standard deviation ¢/+/N. Since М and é/+/N are constants, | 
it follows that the critical ratio (М — NMDA/N/6 is distributed 
normally about zero with unit standard deviation. 

When é is unknown, as is usually the case, the sample standard 
deviation е is substituted for 6 in the critical ratio. Now the ratio 
(M — М) ~/N/s, having а variable numerator which is normally 
distributed and a variable denominator which has a skew distribu- 
tion, is not normally distributed. As JN increases, however, the dis- 
tribution approaches normality rapidly. When N is greater than 
about 30, the error resulting from treating the ratio as though it 
were in fact normally distributed tends to be negligible, but when N 
is small the error cannot be safely ignored. In order to make sound 
inferences from small sample means, we need a sampling distribu- 
tion which takes into account the size of the sample and which is 
independent of ¢. 

The Distribution of the t Ratio. 'The exact sampling distribu- 
tion of the ratio of a difference between sample and population 
means to its standard deviation was first investigated by the Eng- 
lish statistician, W. S. Gosset, who wrote under the pen name 
“Student.” (See Ref. 23.) By somewhat empirical methods he ob- 
tained the sampling distribution of the ratio (X — X,)/s, in which 


Х and X, are sample and population means, respectively, and 
з= VEU — XN — D = М 24170 — 1). Tt сап readily be 
shown that s — c A/NKN — 1). Аз previously noted, the standard 
deviation е of a sample tends to underestimate the standard devia- 
tion $ of the population. The quantity s is thus better than о as an 
estimate of 4. 

Later statisticians, notably Fisher, building upon * Student's" 
foundation, determined the sampling distribution of a t ratio. This 


ratio, essentially similar to * Student's" ratio, is defined as 


ША А: (i Xe) VN, (8.29) 


In ап experimental study of the sampling distribution of the t 
ratio, 120 samples of 5 each were randomly selected from the popu- 
lation of scores, Table ІП, Appendix B, in which X, is 40. The 
difference between each sample mean X and the population mean 
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X, was divided by the value of s in the sample and multiplied by 
4/5. To illustrate, the first sample had a mean of 42.5 with s = 6.5, 
so that the / ratio for that sample was (42.5 — 40) 4/5/6.5 or 
about .9. The distribution of the 120 / ratios thus obtained are 
shown in Table 8.6 and in Fig. 8.13. The distribution is nearly 


TABLE 8.6 
DISTRIBUTION OF t RATIOS 
IN 120 RANDOM SAMPLES OF 5 
EACH FROM А NORMAL 


POPULATION 

Í RATIO Ў 
+4.3-+4.7 il 
+3.8-+4.2 0 
+3.3-+3.7 1 
+2.8-+3.2 2 
2.3---2.7 0 
+1.8-+2.2 Б] 
+1.3-+1.7 10 
+0.8-+1.2 12 
+0.3-+0.7 18 
22 20 

КЕ 16 

.8 13 

.3 8 

.8 8 

3 3 

.8 2 

-3 2 

8 0 

3 1 

symmetrical about zero (o; = —.03), but is markedly leptokurtic 


(a4 = 3.86). This experimental sampling distribution is in good 
agreement with the theoretical distribution, as indicated in the 
figure. 

The equation of the theoretical sampling distribution of the £ 
ratio is 


p\ = 
йе cQ T ) СЕ (8.30) 


п 
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where C = _ @- 1/2 _ 
Мля (а — 2)/2]! 
freedom attaching to $. More about degrees of freedom later. 


and n is the number of degrees of 


15 
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Frequency 


0-50 -40 -30 -20-10 0 10 20 30 40 50 
+ Вано 


Fig. 8.13. Frequency polygon of the distribution of 
120 t ratios (Table 8.6) and curve of theoretical t dis- 
tribution having 4 degrees of freedom. 


Since the factorial expression for C is rather difficult to evaluate, 
we give below the values of C for several values of n. 


[^] 
.318 
.354 
.375 
.380 
.388 
.393 
.394 
.396 


з 


Sonon kwe 


w y e 


Using the C constant for any given n, the y: corresponding to 
selected values of / can be obtained from (8.30) and the curve of 
the distribution for that n plotted. The curves for n = 1, n = 2, 
and n = 5 are shown in Fig. 8.14, along with a normal curve drawn 


on the same scale. 
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Several important features of the ¢ distribution can be seen in 
equation (8.30) and Fig. 8.14. The distribution is obtained from 
the joint variation of X — X, and $ in successive samples; hence it 
is independent of ¢. It is symmetrical about zero and is unimodal. 
The shape of its curve depends upon n, the number of degrees of 
freedom attaching to s. (In the present case, the value of n is the 
size of the sample less 1, i.e., n = N — 1.) In effect there are many / 


0.40 - 


0.35 |- 


o o 
[^] 
ва 
т 


Relative frequency 
о 
N 
о 


-4 -3 -2 -1 0 1 2 3 4 
Values of | 


Fig. 8.14. t Curves for n = 1, n = 2, n — 5 and normal curve. 


distributions, one corresponding to each n, but as n becomes larger 
the curve approaches normal form rapidly. When n is greater than 
about 30, the normal curve is not a bad approximation to the £ 
curve, and the approximation becomes steadily better as n increases. 
For n small, the curve is decidedly leptokurtic, with larger tails 
than the normal curve. 

Since the form of the { distribution changes with n, the area 
under the curve subtended by given intervals on the base line 
varies with n. Hence, the probability figure corresponding to a 
given / ratio depends upon n. Table D, Appendix C, is a typical 
table of ¢ ratios corresponding to specified probability figures for 
various n’s. As noted at the foot of the table, the probability figures 
are based upon both tails of the distribution. By relating the curves 


————+ 
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in Fig. 8.15 to Table D and working exercises 51 and 52 the student 
can readily become proficient in the use of the table. 
We shall take up the use of the / distribution in testing hypotheses 


Region of 
acceptance 


—4.30 0 430 


Region of 
acceptance 


Region of 
acceptance 


-2.78 0 2.78 —2.04 0 2.04 
t Ratio t Ratio 
(n=4) (n=30) 


Fig. 8.15. The two-sided .05 regions of rejec- 
tion in the t sampling distribution for n = 2, 
п = 4, and п = 30. 


and in estimation after we look into the meaning of degrees of 


freedom. 
Degrees of Freedom. The complete explanation of the concept 


of degrees of freedom lies in advanced statistical theory, but an 
understanding of the concept for practical purposes is not difficult 


to acquire. 
The number of degrees of freedom of an estimate of a parameter 
always refers to the number of independent values which contribute 


to the estimate. Let us see why this number is № — 1 for the esti- 


mate s appearing in the / ratio of (8.29). 
Suppose that we are sampling from a normal population of scores 


with standard deviation ¢ and that the scores are in deviation form. 


(This means merely that the mean of the population has been sub- 


tracted from each score.) If we had all of these scores in the popu- 
lation, they would of course sum to zero. A sample of deviation 
scores, however, would not ordinarily sum to zero; in fact, if there 
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were no restrictions, the sum of a sample of deviation scores might 
be very different from zero owing to sampling fluctuations. One 
restriction which we could place on a sample of № deviation scores, 
before using them to estimate $, would be that they sum to zero, 
i.e., Хх = 0. In other words, we could force the sample and popula- 
tion to agree in mean value and thus make the sample more repre- 
sentative of the population than it otherwise might be. But if this 
restriction were made, one of the № deviation scores would not be 
independent. The student can easily convince himself that this is 
so by arbitrarily fixing N — 1 deviation scores and noting that the 
Nth is always determined by the sum of the others and the restric- 
tion Dr = 0. 

We may look at this in a somewhat different way, but with the 
same results. In estimating the standard deviation 4 of a population 
from a sample of N scores, we cannot use the deviations X — X, 
of the scores from the population mean, since this mean is unknown. 
We therefore take the deviations X — X of the scores from the 
sample mean and, in effect, force population and sample means to 
agree in estimating 4. The restriction results in the loss of one 
degree of freedom, for reasons noted in the paragraph above. It is 
always the case that one degree of freedom is lost for each popu- 
lation constant, such as a mean, which must be determined from 
the original data. 

Since № — 1 of the sample data are free to vary, IN — 1 con- 
tribute independently to the value of s. Consequently, the variation 
of s from sample to sample depends upon the variation not of № 
but of N — 1 of the sample data. The variation of the { ratio of 
(8.29), for any X, in turn depends upon the variation of the de- 
nominator s. Thus, the distribution of that ratio has N — 1 degrees 
of freedom. 

"The number of degrees of freedom of an estimate of a parameter 
is поб always one less than the number of original observations; 
the decrease depends upon the number of independent restrictions 
necessary in arriving at the estimate or, what is the same thing, the 
number of constants which must be determined from the observa- 
tions. The underlying principle, however, is always the same. 
Essentially the principle is that a statistic, as an estimate of a param- 
eter, has degrees of freedom equal in number to the number of 
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mon sense as well as sound theory. The principle may be stated 
in-any of the following ways: The number of degrees of freedom 
of a statistic, taken as an estimate of a parameter, is equal to 


a. The number of observations less the number of independent 
restrictions placed upon them in calculating the statistic. For 
example, in calculating s we have N observations (N deviation 
scores) which have the single restriction that they must sum to 
лего. 

b. The number of observations minus the number of constants 
determined from them used in calculating the statistic. In 
calculating s we use № observations and one constant, the mean, 
determined from the observations. 

c. The number of observations contributing to the value of the 
statistic that are free to vary. In determining s, № — 1 observa- 
tions are free to vary, only one being fixed. 


The statements mean the same thing, namely, that the number 
of degrees of freedom of a sample statistic is the number of inde- 
pendent observations which determine the value of the statistic. 
We shall find that the x? and F statistics, as well as /, involve degrees 
of freedom. 

Inferences from a Small Sample Mean. The use of the / dis- 
tribution in drawing inferences about a normal population mean, 
say X,, from a small sample involves only the calculation of the 
Lratio and the referral of the ratio to Table D, under the appropriate 
number of degrees of freedom. It is to be remembered that the 
probability figures corresponding to the tabled values of / are 
based upon both tails of the distribution, 

EXAMPLE. In a random sample of 10 from a population of 
normally distributed IQ's the following 10/8 are observed: 105, 
98, 120, 95, 115, 100, 110, 125, 92, 130. Ts the evidence provided 
by this sample consistent with the view that the mean in the 
population is 1002 x 

The mean of the sample is 109, and Z(X — X)? = 22° = 
1,558. Hence, s — 1,558/9 — 13.2. The hypothesis to be 
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tested is that X, = 100. By (8.29) we have the ratio 


_ (109 — 100) VIO _ 2.15, 


: 133 


with degrees of freedom n — 9. Entering Table D at n — 9, 
we note that, since 2.15 falls between 1.83 and 2.26, .10 > 
P > .05. Although the hypothesis is in some doubt, we cannot 
reject it at the 5 per cent level. 

Note that the hypothesis X, < 100 can be rejected at the 
5 per cent level. In testing this hypothesis, the (c) region of 
rejection, Fig. 8.2, is appropriate. The probability of a ratio 
of 4-2.15 or more, if the hypothesis is true, is 1/2 that stated in 
Table D. Thus, for the hypothesis X, < 100, .05 > P > .025. 


Tt is instructive to compare the probability figure from Table D 
with that we would have obtained had we treated 2.15 as a critical 
ratio and referred it to the normal probability scale. The latter 
procedure yields a P of about .03 for the hypothesis X, — 100. 
The reason for the discrepancy is apparent when the tails of the 
curve for n — 9 are compared with those of the normal curve. For 
small samples, normal sampling distribution methods give proba- 
bilities which are substantially too small and thus result in the 
rejection of hypotheses more often than is justified. 

Unlike the normal sampling distribution, the { distribution does 

‚ not permit a general statement or formula for the confidence limits 
of a population mean. This is because the width of a specified inter- 
val varies with n; the smaller the value of n the wider the interval. 

To estimate, say, the 95 per cent confidence limits of X, from a 
given sample of size №, we must find the value of / in Table D 
corresponding to the probability figure .05 at n (= № — 1) and 
substitute in formula (8.29). Applying this procedure in the above 
example we would have 


- 009 — X,) МЮ 
£226 = Bic 


from which, solving for X,, we would obtain 109 + 9.4 or 99.6 and 
118.4 as the 95 per cent confidence limits. To estimate the 99 per 
cent confidence limits, we would set { = + 3.25 and proceed as 
above. 
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Ordinarily the most convenient computational form of formula 
(8.29) is 
See ге (8.31) 


* ET =x)! 
NN- 1) 


in which X represents the raw or gross scores. Tt is left as an exercise 
for the student to derive (8.31) from (8.29). 

The t Test of Differences between Means of Two Independ- 
ent Samples. Fisher (Ref. 7) has shown that the / distribution 
has broader application than was originally contemplated. He has 
demonstrated that any ratio whose numerator is a normal deviate 
and whose denominator is an independent estimate of the stand- 
ard deviation of the numerator is distributed as { with degrees 
of freedom n equal to the number of degrees of freedom con- 
tributing to the estimate of the standard deviation. Among the 
more important of these applications is the so-called / test of the 
difference between sample means. 

In random, independent samples from normal populations with 
the same mean and the same standard deviation å, the statistic, 


Ха 
пва асмана 8.32 
s V1/N: + 170 ag 


where the Х’з and N’s have their usual meaning and $ is the best 
estimate of ¢ provided by the two samples, follows the / distribution 
with n (= № + № — 2) degrees of freedom. Thus, inferences about 
differences between two population means can be made from the ¢ 


ratio. 
The estimate s is obtained by adding the separate sums of squares 


and dividing by № + № — 2, i.e., 


р За + 211. 
SENN tN- 2 


Tt will be noted that s is based upon two sums of squares. Since one 
degree of freedom is lost in computing each sum, the number of inde- 
pendent observations contributing to s is Ny + № — 2. Substituting 
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ОЗА, 
4 Ут? + Bx} TE n Т 
Ni + № — 27 ХМ М, 


The sum of squares Sz? in a sample usually is most easily obtained 
from the raw scores by the formula 


(8.33) 


Da? = x voe — (zXy]. 


As an illustration of the ¢ test, we have the following example: 


EXAMPLE. Їп a transfer of training experiment, two small 
groups, of 5 and 6 students, were randomly selected. One 
group received intensive training in a certain method of solving 
algebra problems, the other group in a second method. At 
the end of the experimental period, both groups were given a 
test containing 20 original problems. The scores on the test 
are shown below. Is the difference between the means of the 
groups significant? 


crouer I (№, = 5) Group u (N: = 6) 

X xi х, x: 

18 324 13 169 

и 289 9 81 

15 225 9 81 

10 100 и 49 

6 36 6 36 

sum 66 974 6 36 
SUM 50 452 


The hypothesis to be tested is, М, — М, = 0. To test the 
hypothesis we compute 


с 66 = 
А: = qu = 13.2, Xs = 50 = 8.3, 
N 2 == 
жулу ын О UL ae 


5 6 
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Substituting in (8.33) we have 


"- 13. — 8.3 m» y 


4 102.8 + 35.3 n i 257 

5+6—2 БЕ. 
Entering Table D atn = 5 + 6 — 2 = 9, we find that .10 > 
P > .05. The difference is not significant at the 5 per cent 
level. There is insufficient reason to conclude that the popula- 


tion means are unequal. So far as these data are concerned, 
no real difference between methods is demonstrated. 


If desired, formula (8.33) may be used to determine confidence 
limits of the difference between population means. After the 
denominator has been obtained, it is multiplied by the / value cor- 
responding to the selected limits and the product is subtracted from 
and added to the difference between sample means. The 95 per cent 
confidence limits of the difference between population means in the 
above example are (13.2 — 8.3) + 2.26(2.37) or about —.5 and 10.3. 

The ¢ test is applicable to samples of any size; unless the №з are 
small, however, the exact results obtained through its use do not 
differ materially from the approximate results obtained by critical 
ratio procedures. The assumptions underlying the ¢ test are random- 
ness of sampling and population normality. The soundness of the 
latter assumption cannot be satisfactorily examined when the 
samples are small, but must be judged from what is known a priori 
about the population or what can be deduced from the nature of the 
measurements. Bartlett (Ref. 1) has shown that considerable de- 
parture from normality in the sampled populations does not in- 
validate the test. 

The most noteworthy limitation of the / test results from the joint 
nature of the Е ratio of (8.32). If the sampled populations are in fact 
equally variable, a significant value of t indicates a significant dif- 
ference between means; however, a significant value of ¢ could arise 
in sampling from populations having equal means but different 
standard deviations. In order to be sure that a significant value of 
1 indicates a difference between population means, it is necessary 
first to show that the samples do not differ significantly in varia- 
bility. This can be done by application of a test which will be taken 
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up in connection with the F distribution. Unless it can be said that 
the sampled populations are alike in variability, no inferences con- 
cerning differences between means are warranted. Moreover, unless 
this can be shown, the / test cannot logically be applied. If the popu- 
lations are different in variability, the estimate s of (8.32) has no 
clear meaning. 

Tt will be recalled that the critical ratio test of differences between 
means does not require that the populations be of equal variability. 
When the samples are large enough to justify the CR technique, 
one can determine whether the difference between two sample 
means is significant, irrespective of possible significant difference 
between variabilities. The advantage, however, is more apparent 
than real. As a rule, comparisons of central tendencies are unfair 
and misleading unless variabilities are equal or nearly so. This 
point was emphasized in Chapter IV. 

The t Test of Differences between Means of Two Correlated 
Samples. Consider a situation in which the measures X; in one 
sample are paired with the measures Y; in a second, as is the case 
in matched-individuals experiments and in experiments where the 
same individuals are measured before and afler some treatment. 

We can form a series of № differences (X; — Yi), (Xs — Yə), 
... (Xy — Ур). It may readily be shown that the mean D of the 
differences is the same as the difference X — Y between the means 
of X; and Y;. Now we may conceive of these differences Di, Ds, . . ., 
Dy as a sample of size № from a population of differences, in which 
the mean D, is the same as X, — Y,. Assuming normality in the 
population of differences, the ratio 


ope (D — D) VN D-BD) . 
8p $0? — (SD)? (8.34) 
NN — 1) 


is distributed as / with № — 1 degrees of freedom. Thus, hypotheses 
regarding the value of the difference X, — Y, may be tested, uti- 
lizing the differences between paired scores. For an example, let us 
return to the data of Table 8.5. There the mean D of the differences 
between scores was —1.43. To test the hypothesis D, = 0, we sub- 
stitute the appropriate data at the foot of the column of differences 


Statistical Inference 467 


of Table 8.5 in formula (8.34) and have 


Иа (-143—0) коз іы 
126) = 40) -967 
28:28 — 1) 


and enter Table D at n = 27. We find .20 > P > 10, and thus 
have insufficient reason to reject the hypothesis. 

Confidence limits of the difference X, — Y,or, what is the same 
thing, D, may be found as usual. 

It will be noted that, in the paired scores situation, the assumption 
that 2, = é, is unnecessary. The only assumption needed to validate 
the procedure is that the sample of differences is randomly taken 
from a normal population of differences. 

Correlations Involving a Common Variable in a Single 
Sample. A problem which frequently arises in test selection is that 
of determining whether in a given sample the coefficient of correla- 
tion rı between a criterion variable and a predictor variable 15 sig- 
nificantly more or less than the coefficient гіз between the same 
criterion and a second predictor. 

Under the assumption that the criterion variable is homoscedastic 
and normally distributed for each set of values of the predictor 
variables, Hotelling shows (Ref. 9) that, in all possible samples for 
which the predictors have the same set of values as those observed 
in the given sample, the statistic 


(N — 3)0 + T23) 


2(1 F 2гзг1зГэз — lis — Ti — Th) (8:35) 


ф (гіз — Газ) 


follows the { distribu tion with N — 3 degrees of freedom. The theory 
is illustrated in the following example. 


EXAMPLE. In the class of 138 freshmen, Table IT, Appendix B, 
the following correlations are present: semester averages with 
VAT scores, riz = .46; semester averages with MAT scores, 
та = .30; VAT scores with MAT scores, ros = .28. Is ri» sig- 
nificantly greater than гз? 
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The hypothesis to be tested is that #12 — fıs < 0. Substitut- 
ing in (8.35) we have 


35) + .28) ee 
t = (46 — 30) vn 5046) 302) 1.78. 
— (46): — (.30)? — (.28)*] 


Referring to the / table, we find that the hypothesis can be 
rejected at the 5 per cent level. (Note that a one-sided region 
of rejection is appropriate for this hypothesis.) The correlation 
of VAT with semester averages is significantly higher than 


that of MAT. 


It will be seen that the conditions of the above test are quite 
restrictive. However, when sample size is large or moderately so, 
the conditions presumably can be relaxed sufficiently to permit use 
of the test in most practical situations. At present, it appears to be 
the most convenient and useful test available for the hypothesis in 
question. 

Significance of Regression and Correlation Coefficients. 
Several correlation statistics have sampling distributions which 
follow the / distribution. One of these is Бу, the coefficient of regres- 
sion of Y on X. In sampling from a normal bivariate population the 


ratio 
(Dye — Вы): VN = 2 
Бел ae 8.36 
Мо} — boi ШУ) 


is distributed as / with N — 2 degrees of freedom. The number of 
degrees of freedom is two less than the number of pairs of data 
because in fitting a line of regression two constants, the slope and 
the intercept, must be determined from the data. 

To test whether a sample value by- departs significantly from a 
proposed population value biz, we substitute in (8.36) and refer the 
resulting quotient to a table of ¢ under n = N — 2 degrees of 
freedom. 


EXAMPLE. For the illustrative data of Table 6.9, р. 268, № = 23, 
bye = .25, oy = 4.81, and с» = 11.41. Is it probable that this 
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regression coefficient arose by chance in a sample from a popu- 
lation in which by- = 0? 
Substituting in (8.36) we have 
(25 — 0)(11.41) V/21 13.1 
= — 9.39. 
^/ (4.81? = (.25)°(11.41)* 3.87 


Entering Table D at n = 21, we find .01 > P > 001, and 
conclude that it is quite improbable that бу is 0. 


i 


If c, and о, are interchanged, formula (8.36) may be used to test 
the significance of the regression coefficient DE 

Tn sampling from a normal bivariate population in which й. = 0, 
the ratio 
paa N-A (8.37) 

мм, 

is distributed as ¢ with N — 2 degrees of freedom, Only the hypothe- 
sis that fs, = 0 can be tested by formula (8.37). Other hypotheses 
regarding the value of fz, must be tested by the 2, technique, pre- 
viously described. Formula (8.37) may be readily adapted to test 
whether a partial correlation coefficient is significantly different 
from zero. The first-order coefficient лоз involves N — 3 degrees 
of freedom; the second-order coefficient r1z.s4 involves N — 4 
degrees of freedom; and so on. 

When the number of pairs of observations N is about 8 or more, 
the significance of the rank difference correlation coefficient r4 may 
be tested satisfactorily by referring the ratio 


RV (8.38) 


to a table of t under N — 2 degrees of freedom. 'The significance of 
та in samples of size less than 8 is best tested by methods described 
in Ref. 3, p. 260. 

The / test of the significance of the point biserial coefficient of 
formula (6.5) is the same as that of ra given in (8.38) above. 

The 1 distribution may be used in testing the significance of 
yarious other correlation statistics in small samples. Rider (Ref. 18) 
illustrates the application to multiple regression coefficients and 
to the difference between regression coefficients. 
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Тһе 1 technique makes it possible to infer soundly from quite 
small samples the presence or absence of correlation in the popu- 
lation, and this has perhaps tended to obscure the necessity of 
large samples in dependable correlation analysis. Correlation and 
regression coefficients in small samples tend to be poor estimates 
of the corresponding parameters, and consequently provide little 
reliable information regarding the amount of variation in the de- 
pendent variable which is explained by the independent variable(s). 
А small sample regression equation tends to bea poor approximation 
of the population regression equation. In the small sample situ- 
ation, predictions are subject not only to the usual error of estimate 
but to the relatively large sampling errors which infest the regression 
equation. It is to be remembered, particularly in correlation work, 
that the significance and the reliability of sample statistics are two 
quite different matters. 

Concluding Remarks. The { distribution is the exact distribu- 
tion of the ratio of a normal deviate to its estimated standard 
deviation. It is applicable to samples of any size, from 2 upward, 
drawn at random from a normal population. Its limitations are 
due, not to its own properties, but to the uncertain nature of small 
sample statistics. Yule and Kendall (Ref. 26, p. 485) remark: 


It cannot be over-emphasized that estimates from small samples are 
of little value in indicating the true value of the parameter which is 
estimated. Some estimates will be better than others, but no estimate 
is very reliable. In the present state of our knowledge this is particularly 
true of samples from populations which are suspected not to be normal. 

Nevertheless, circumstances sometimes drive us to base inferences, 
however tentatively, on scanty data. In such cases we can rarely, if ever, 
make any confident attempt at locating the value of a parameter within 
serviceably narrow limits. For this reason we are usually concerned, in 
the theory of small samples, not, with estimating the actual value of a 
parameter, but in ascertaining whether observed values can have arisen 
by sampling fluctuations from some value given in advance. For example, 
if a sample of ten gives a correlation:coefficient of + 0.1, we shall inquire, 
not the value of the correlation in the parent population, but, more gen- 
erally, whether this value can have arisen from an uncorrelated popu- 
lation, i.e., whether it is significant of correlation in the parent.* 


* G. U. Yule and M. С. Kendall, An Introduction lo the Theory of Statistics. 
Copyright 1950 by Charles Griffin & Co., Ltd., London, and used by their 
permission. 
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In addition to their susceptibility to relatively large sampling 
errors, small sample statistics tend to be susceptible to errors of 
measurement. For example, if the obtained scores in a small sample 
are not reliable, their mean and variance may be poor estimates of 
the mean and variance of the true scores of the sample. A second 
measurement on the same sample might yield a quite different 
mean and variance. In such a situation, any inferences at all tend 


to be questionable. 
The fact that the / distribution is logically applicable to small 
samples does not, as a rule, lessen the desirability of large samples. 


Exercises 


51. From Table D, what probabilities correspond to each of the following 
values of n and t: (a) n = 1, ( = —31.8; (b) n = 1, t = +31.8; (c) 
n=1, = +31.8; (d) п = 10, [= —1.09; (e) n = 15, t = 41.34; 
(f) n = %, t= +1.96? 

52. What values of £ demarcate the 5 per cent regions of rejection sketched 
in Fig. 8.2 when n = 2) When п = 10? When n = 60? 

53. What does the last row of Table D tell us about the / curve? 

54. The scores of 10 students in a standardized statistics test were 41, 
50, 65, 61, 56, 59, 74, 23, 47, and 54. If the mean score of students in 
the past in this test is 50, is the present class of 10 an unusual one? 

55. In two random samples of 5 and 6, respectively, the following IQ's were 
observed: 118, 116, 110, 110, 106 and 112, 109, 107, 106, 106, 105. 
(a) Can we believe that either of these samples is from a population 
in which the mean IQ is 100? (b) Can we believe that, the sampled 
populations are alike in respect to mean IQ? 

56. What are the assumptions underlying the / sampling distribution? 

57. In what two ways is the / test of a difference between means superior 
to the CR test? In what way is it more restricted? 

58. Apply the / test of differences between means of two correlated samples 
to the personality inventory data listed on p. 410. Would you expect 
the results of the CR test from formula (8.16) to give very different 
results than the £ test in this case? Why? 

59. The following statistics were obtained from a sample of 30: bys = .22, 
с: = 10.0, су = 6.0. How probable is it that bye = 09 

60. Apply the CR test in exr. 59, above, using the standard error of by; 
given in Table 8.4. [Recall that rz, = by:(¢z/cy).] 

61. In a sample of 5, the coefficient of correlation between speed and 


Y 
5 
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accuracy in arithmetic reasoning is —.65. Is the hypothesis tenable 


that fry > 02 

62. Using formula (8.37), determine the absolute value of rey needed for 
significance at the 5 per cent level when N = 5; when N = 10; when 
М = 15. 

63. In a sample of 9, а rank difference coefficient of correlation га of .60 
was observed. Test the hypothesis that Та = 0. 

64. Test the hypothesis that ra < 0 for the data in Table 6.3; in Table 6.4. 

65. A timed reading test was given to a sample of 86, and the scores 
correlated with college marks with riz = .35. An untimed parallel form 
of the test was given to the same group, and the scores correlated with 
college marks with газ = .55. The timed and untimed reading test 
scores were correlated with rs = .65. Is the difference between ri» 
and гз too great to be accounted for by sampling fluctuations? 


The x! Sampling Distribution 


A common problem in research work is that of determining 
whether a set of observed frequencies is consistent with the set of 
frequencies expected if some theory or hypothesis about a popu- 
lation is true. For example, we may hypothesize that participation 
in student activities is unrelated to school grades and compare the 
frequencies observed with the frequencies expected if the hypothesis 
is in fact true, classifying our observations as in Table 6.8, p. 262. 
If the discrepancies are too great to be reasonably ascribed to 
sampling fluctuations, the hypothesis is discredited. As another 
example, we may wish to determine whether the frequencies in the 
classes of a sample distribution differ sufficiently from theoretical 
normal frequencies to discredit the assumption of normality in 
the sampled population. In general, this sort of problem arises 
whenever we are interested in determining whether sample fre- 
quencies in specified classes are compatible with the frequencies 
we would expect in these classes if some theory about the popu- 
lation is true. 

The statistic which is used in such problems is known as х? 
(chi square), which was defined earlier, p. 263, as 


ases deest 
x күзү (8.39) 


in which fo is the observed frequency in a class and f, is the fre- 


—————————————— 
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quency expected if a theory or hypothesis is true, the summation 
being over all classes in which comparisons are made. 

The sampling distribution of x? is of wide usefulness. We shall 
find that it is applicable to several problems in connection with 
sample variances, as well as to problems concerning the com- 
patibility of observed and theoretical frequencies in classes. 

The Distribution of x2. In an experimental study of the х? 
sampling distribution, 120 samples of 40 each were randomly 
selected from the population of Table IIT, Appendix B, in which 
the frequencies of the attributes А, В, and not-A-or-B are 200, 
100, and 100, respectively. Hence, in samples of 40, the theoretical 
or expected frequencies are 20, 10, and 10. The frequencies of 
A, B, and not-A-or-B in each sample were recorded, and the 
value of x? as defined in (8.39) was computed. To illustrate, if the 
frequencies in a sample were 23, 6, and 11, we would have in tabular 
form 


ATTRIBUTE 
A B Not-A-or-B. 

Observed frequency fo 23 6 11 
Theoretical frequency f, 20 10 10 

fo — fe 3 —4 1 

(fo — Л)? 9 16 Т 

Шы 9/20 16/10 1/10 

ў 


so that the value of x? in that sample would be 9/20 + 16/10 + 
1/10 or 2.15. 

The 120 values of x? thus obtained from the chance discrepancies 
between observed and theoretical frequencies are grouped in Table 
8.7. The histogram of the distribution is shown in Fig. 8.16. It 
will be noted that the histogram follows roughly the shape of the 
superimposed curve of the theoretical x? distribution. 

The frequency distribution and the histogram obscure an impor- 
tant feature of the x? values. The values are discrete. In this situ- 
ation they necessarily differ by 1/20; moreover, not all multiples 
of 1/20 can be equalled by them. The student can satisfy himself 
that this is so by taking samples of 40 from the population (or by 
arbitrarily manipulating frequencies) and computing the resulting 
x2,s. We shall return to this feature a little later. 


TABLE 8.7 
VALUES ОЕ x2 IN 120 SAMPLES 
OF 40 EACH FROM A POPULATION 
HAVING 3 ATTRIBUTES 


-------------------- 


Xe 7 
10.50- 1 
9.75- 0 
9.00- 0 
8,25- 2 
7.50- 0 
6.75- 1 
6.00- m 
5.25- 6 
4.50- 5 
3:75- 6 
3.00- 12 
2.25- 

1.50- 21 
0.75- 16 
0.00- 43 
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Fig. 8.16. Histogram of the distribution of 120 x? 
values (Table 8.7) and curve of theoretical x? distri- 
bution with 2 degrees of freedom. 
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Several characteristics of the x? statistic and its sampling dis- 
tribution can be deduced from the definition (8.39) and the table 
and figure. Its value depends upon discrepancies between observed 
and theoretical frequencies. If there are no discrepancies, i.e., if 
the frequencies are in perfect agreement, x? = 0. Since the dis- 
crepancies are squared, the value of х? cannot be negative. Since 
the squared discrepancies are summed, the greater the number of 
discrepancies the greater the range of the x? values. If in the above 
experiment there had been, say, six classes instead of three involved 


o 


e 
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Relative frequency 


o 
o 
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Value of x? 


Fig. 8.17. Curves of theoretical x? distribution for 
= 5, n = 10, and n = 20. 


in the comparisons of observed and theoretical frequencies, the 
range would have been considerably broader. 

The sampling distribution depends entirely upon the number of 
discrepancies contributing to the value of x? which are independent, 
i.e., the number of degrees of freedom n. Thus, in effect, there are 
many distributions, one corresponding to each n. 

The curves of the theoretical distribution* of x? having 5, 10, and 
20 degrees of freedom, respectively, are shown in Fig. 8.17. Tt will 
be noted that the shape and position of the curves vary with n, 
the number of degrees of freedom—the greater the number, the 


* The general log equation for the distribution is 
2 —2 —2 
log y = = loge +” =— log х? — P log 2 = Jog (* 5 2) 


The ordinates у corresponding to various values of x? for а given п can be 
determined from this equation. 
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wider and the more symmetrical the curve. The curve approaches 
the normal curve as п increases; in fact, the normal curve is a 
special case of the x^ curve. 

Table E, Appendix III, shows the values of x? which correspond 
to given probabilities for n's from 1 to 30. The table is read and 
interpreted as follows. When n — 2, the .05. point or the value 
beyond which 5 per cent of the area under the curve lies is 5.99. 
Hence, for п = 2, if x? equals or exceeds 5.99, the hypothesis that 

Ја given set of discrepancies between observed and theoretical fre- 
quencies are due to sampling fluctuations or chance can be rejected 

` at the 5 per cent level. On the other hand, when n — 2, the .95 point 

{ or the value beyond which 95 per cent of the area lies is .103. It 
follows that only 5 per cent of the х?,5 computed from random 
samples are as small as .103. Hence, for n — 2, if x? is .103 or less 
the agreement, between observed and theoretical frequencies is so 
good that it raises the question whether the sampling technique 
permitted chance to operate freely, i.e., whether the compatibility 
of the frequencies was fairly tested. 

The interpretation of other percentage points for п = 2 and of 
the percentage points for the other n's of Table E are similar to 
the above. In general, when the probability of a x? value from a 
random sample is between about .05 and about .95, the disagreement 
between observed and theoretical frequencies may reasonably be 
ascribed to chance. When the probability is .05 or less the hypothe- 
sis that the disagreement is due to chance is discredited at the 5 per 
cent or better level. When the probability is .95 or greater the 
sampling techniques or the calculations are suspect. It is open to 
the investigator, of course, to adopt more or less exacting probability 
figures. 

Assumptions Underlying the x’ Test. There are three types 
of problems which require comparisons of observed and theoretical 
frequencies and the use of-the x? sampling distribution. The types 
differ chiefly because the theoretical frequencies involved in each 
are determined by somewhat different methods. They are essen- 
tially similar in nature, and the assumptions underlying the applica- 
tion of the x? test are the same for each. Before discussing the prob- 
lems, we need to examine these assumptions. 
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There are several approximations inyolved in fitting continuous 
curves, such as those of Fig. 8.17, to the distribution of the neces- 


„sarily discrete values of x?, as computed from frequency data. These 


approximations may be poor unless the binomial distribution of 
observed frequencies about theoretical frequencies in each class is 
sufficiently normal to permit approximation by the normal curve. 
For example, in the sampling experiment described. above, it is 
assumed that the numbers of A's, B's, and not-A's-or-B's are dis- 
tributed about 20, 10, and 10, respectively, in approximately 
normal form. In general, this assumption is considered to be satis- 
fied if the f, in any class is not very small. When an f, is very small, 
the f/s in successive samples tend to range more widely above f, 
than below, since an f, cannot be less than 0. As we have seen, such 
a situation usually results in skewness. How small is a difficult 
question, but there is a great deal of experimental evidence and 
rather wide agreement among statisticians that in no class should 
fı be less than 5 and that it is safe to go that low only if the total 
Гв over all the classes is about 40 or more. 

It is also assumed that the probability of obtaining any of the 
attributes or events in question remains constant, or practically so, 
during the sampling process. This assumption is considered to be 
satisfied if (1) the population is large relative to the sample, (2) 
the attributes or events are independent, and (3) the sample is 
random. 

'The sampling distribution x? as defined in (8.39) is based upon. 
the number of independent discrepancies between observed and 
theoretical frequencies. This number is the degrees of freedom n. 
It is equal to the total number of classes in which comparisons are 
made, 1.е., the total number of discrepancies, less the number of 
respects or constants in which the theoretical and observed fre- 
quencies are forced to agree. The calculation of n in a given problem 
is relatively simple and will be illustrated in connection with the 
three general types of problems involving frequency data to which 
the x? test is applicable. М 

The x? Test When Theoretical Frequencies Can Ве Deter- 
mined from the Size of the Sample. The simplest type of prob- 
lems to which the x? test is applicable comprises those in which the 
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theoretical frequencies, i.e., the frequencies expected if the hypothe- 
sis is true, can be determined from the size of the sample. As an 
example of problems of this type consider the following. 

In a certain high school the question arose whether referrals by 
teachers of “ problem students” to school counselors were distributed 
uniformly over the five days of the week. During a given period, 
considered to be typical, the following frequencies of firsl referrals by 
days of the week were observed: Monday, 25; Tuesday, 10; Wednes- 
day, 18; Thursday, 24; Friday, 36—a total of 113 referrals. On the 
hypothesis of uniform distribution, the theoretical frequency for 
each day is 113/5 or 22.6. The test of the hypothesis is to determine 
whether the discrepancies between observed and theoretical fre- 
quencies can reasonably be attributed to chance. 

The data may conveniently be arranged in tabular form: 


MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY 


fe 25 10 18 24 36 

fi 26 226 22.6 22.6 22.6 

КОЕТ? do 4T 6; 1256 —4.6 +14 +134 

(f. —f 516 158.716 21.16 1.96 119.56 
EDT 

JU .25 1.02 94 .09 1:95 


The value of Z((f, — /)°//] or x? is 16.2, to three-figure accuracy, 
with 4 degrees of freedom. In Table E at n — 4, the value 16.2 
falls well beyond the 5 per cent point; in fact, the probability cor- 
responding to 16.2 is less than .005. If the hypothesis is true that 
referrals are uniformly distributed by days of the week, the dis- 
crepancies between observed and theoretical frequencies would 
result in a x? value as great аз 16.2 in fewer than 5 samples in 1,000. 
Thus, the hypothesis is strongly discredited. The reason why there 
is uneven distribution of referrals, of course, cannot be determined 
from the data. It may be that teacher patience or student behavior 
varies from day to day, or it may be something apart from these. 
What it is can be determined, if at all, only by additional study. 
The x? test tells us only that first referrals in this school are very 
probably not distributed uniformly over the days of the week, 
provided the sample of 113 is a fair sample. 

In problems of this type the number of degrees of freedom n is 
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always the number of classes k in which comparisons between f, 
and f, are made less one. In symbols, n = k — 1. One degree 
of freedom is lost because the theoretical frequencies are determined 
from the size of the sample. In other words, =f, and Zf, are forced 
to agree, i.e Zf, = Zf, or Zf, — 2) = 0. In the example above, 
only 4 of the 5 discrepancies are independent or free to vary. 

An hypothesis concerning the value of a proportion f in a two- 
fold population may readily be tested by the x? technique. As an 
example, let us take the data of the illustrative problem, p. 422, 
where 27 in a sample of 36 taxpayers were in favor of a proposal. 
These data may be classified 


FAVORABLE OPPOSED 
fo 27 9 
Л 18 18 


the f, being deduced from the size of the sample on the hypothesis 
that opinion is evenly divided, i.e., that f = .50. Computations 
give x? = (27 — 18)?/18 + (9 — 18)?/18 or 9.00 with n = 1. Re- 
ferring to Table E, we find P < .005. Note that the value of x? 
here is equal to the square of CR. This relationship always holds 
between x?s having 1 degree of freedom and CR's obtained from 
the same frequency data. 

Hypotheses about a twofold population proportion f may be 
tested by either the CR or the x? technique. The latter usually is 
somewhat more easily applied and, as we shall see later, is indis- 
pensable when it is desired to combine the results of several tests 
of significance into a single probability figure. On the other hand, 
confidence limits of a population proportion are more readily esti- 
mated by the CR technique. 

It is to be remembered that, in problems where frequency data 
are classified in k classes and where theoretical frequencies can be 
specified just as soon as the size of the sample is known, х? has 
k — 1 degrees of freedom. 

The x? Test in Analyzing Contingency Data. In our discus- 
sion of contingency correlation we defined the contingency coeffi- 
cient C in terms of x? and mentioned that the significance of C could 
readily be tested by the x? test. 


fac 
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When used in connection with contingency data the x? test indi- 
cates whether one variable as classified is independent of the other, 
or, conversely, whether the two yariables are related. The hypothe- 
sis to be tested is always that there is no relationship between the 
two. If the discrepancies between the frequencies observed and the 
frequencies expected if the hypothesis is true lead to a x? value too 
great to be reasonably ascribed to chance, the hypothesis is dis- 
credited and real relationship between the variables is demonstrated. 

Let us go back to the data of Table 6.8 for an illustration of this 
use of x2. Those contingency data meet the conditions for the appli- 
cation of the x? test, so we may proceed. The value of x? was 22.68. 
For reasons to be noted below, this particular x* has 4 degrees of 
freedom. Entering Table E at n = 4, we find that 22.68 is well 
beyond the 0.5 per cent point. The hypothesis that there is no rela- 
tionship between participation in student activities and semester 
grades is strongly discredited. There is apparently real relationship, 
which, by examination of the data, we find to be inverse. 

The reason why x? has 4 degrees of freedom in this problem is to 
be found in the procedure by which the f/s are determined. Under 
the hypothesis that there is only random association between the 
variables, the f/s in the various cells of the table are determined 
from the marginal totals, as shown on p. 262. The Гв and fo's thus 
are forced to agree in at least 5 of the 6 marginal totals, and hence 
in any, row or in any column the discrepancies sum to 0. As a result 
only 4 of the 9 discrepancies contributing to the value of x? are 
independent or free to vary. The student can verify that when any 
4 of the 9 discrepancies are known, the remaining 5 are fixed. Thus, 
x? has only 4 degrees of freedom. 

It is important to note that, although this procedure limits or 
restricts the information provided by the sample, it does not in 
any way introduce bias. Forcing the f/'s and f,’s to agree in marginal 
totals does not influence the relation between the variables. In one 
or more cells, depending upon n, the frequencies remain free to 
differ and to discredit the,hypothesis. 

In general, in the contingency table consisting of h rows and k 
columns, the number of degrees of freedom n is (h — 1)(k — 1). 

АП contingency table data which satisfy the assumptions under- 
lying the x? test may similarly be tested to see whether they indicate 


| 
| 
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correlation in the sampled population. These include the data 
classified for tetrachoric or fourfold point correlation analysis. 
Whether an r, ог л, calculated from the 2 x 2-fold table is sig- 
nificantly different from 0 can readily be determined by the x? 
test. The computation of the f/s in all such tables follows that 
illustrated on p. 262. In general, the theoretical frequency in any 
cell of the contingency table is equal to the product of the marginal 
totals of the row and the column containing the cell divided by the 
total number in the sample. 

The x? test may be used in testing whether the difference between 
proportions obtained from two independent samples is significantly 
different from zero. The statistics from the two samples may be 
classified in a table essentially similar to the contingency table, 
and the x? test run as usual. 


EXAMPLE. In a sample of 65 10-year-olds the proportion passing 
a given item of an intelligence test is .40. In a sample of 84 
11-year-olds the proportion passing the item is .50. Is this dif- 
ference in difficulty significant? 

Turning the proportions into absolute frequencies and clas- 
sifying the data, we have 


FAIL PASS 

11-year-olds 42 42 84 
(45.1) (38.3) 

10-year-olds 39 26 65 
(35.3) (29.7) 

81 68 149 


The theoretical frequencies, shown in parentheses, are ob- 
tained from the marginal totals under the hypothesis that there 
is no real difference between the proportions or, what is the 
same thing, between observed frequencies of "pass" and 
“fail” in the two samples. If the hypothesis is true, the fre- 
quencies expected, the f/s, in the cells will be proportional 
to the marginal totals. 

The usual calculations give x? = (42 — 45.7)?/45.7 + (42 — 
38.3)2/38.3 + (39 — 35.3)°/35.3 + (26 — 29.7)?/29.7 = 1.51 
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with 1 degree of freedom. Consulting Table E at п = 1, we 
find that the hypothesis cannot be rejected, since .25 > P > 
10. The difference in difficulty is not significant. 

The student can verify that the result is the same as that 
obtained by the CR technique, using the standard error of 
(8.18), and that (CR)? is about 1.5. As previously noted, x* 
with 1 degree of freedom is equal to (CR)?. 


The value of x? may be obtained directly from the fourfold соп- 
tingency table by means of the formula 
х = марс SHOE , (8.40) 
(А + В)(С + D)(A + С)(В + D) 


where № is the total frequency and the other symbols have their 
usual meaning in the fourfold table. 

The x? test can also be applied to differences between proportions 
in two correlated samples (see Ref. 13). When the samples are 
correlated, the quantity 

(4-р) 


x - D (8.41) 


where A and D have their usual meaning in the fourfold table, has 
one degree of freedom. It is left as an exercise for the student to 
show that the CR test of a difference between correlated sample 
proportions, using the standard error of (8.21), gives the same 
results as (8.41). 

It sometimes happens that an investigator will have two fre- 
quency distributions which he wishes to compare and thereby to 
decide whether it is reasonable to believe that the two may have 
arisen by chance in sampling from similar populations. The com- 
parison can readily be made by classifying the distributions in a 
h X 2-fold table and proceeding as in the contingency table. (See 
exr. 70.) 

Correction for Discreteness of x? Values Having One Degree 
of Freedom. As previously noted, x? values computed from fre- 
quency data are necessarily discrete. The discreteness tends to 
be marked for x?'s having 1 degree of freedom, particularly when 
f, is small. When this is the case, the sampling distribution is better 
approximated by the continuous x? curve if the absolute value of 
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each difference between f, and f; is reduced by .5 before it is squared. 
This reduction is known as the “ Yates Correction for Continuity." 

The correction is applicable only in situations where x? has 
1 degree of freedom. It is quite comparable to the .5 correction 
employed in approximating the binomial distribution by the 
normal curve, and, like that correction, it ordinarily may be ignored 
in practice. However, it should not be ignored if its application 
would change a decision about an hypothesis. 

The correction can be embodied in formulas (8.40) and (8.41) if, 
before squaring, |AD — BC| is reduced by N/2 and |A — В] is 
reduced by one. 

Goodness of Fit. The x? test is frequently useful in testing as- 
sumptions or hypotheses regarding the form of a population fre- 
quency distribution. When used in this connection, it is commonly 
referred to as a test of “goodness of fit." The name is not particu- 
larly definitive, since all x? tests of frequency data may be re- 
garded as goodness of agreement or fit of observed and theoretical 
frequencies. 

Essentially the test consists of determining whether a sample 
frequency distribution is sufficiently well fitted by some theoretical 
form, say the normal, to have arisen in sampling from a population 
distributed in that form. In illustration of the test, let us determine 
whether the agreement between the observed and theoretical fre- 
quencies of Table 5.2 is sufficiently good to support the assumption 
that the sampled population is of normal form. The f, and f, from 
that table are now shown in Table 8.8. As noted in Chapter V, the f, 
are the theoretical normal frequencies in a distribution with 
N = 138, X = 552.11, and в. = 79.32. They are the frequencies 
expected under the assumption of normality. 

Before computing x?, the theoretical frequencies in the upper 
three and in the lower two classes are pooled. This should always 
be done when an f, is less than about 5. After pooling these fre- 
quencies, 11 discrepancies remain, but only 8 of them are inde- 
pendent. In determining the theoretical frequencies, as shown in 
Chapter V, the theoretical and observed frequencies are forced to 
agree in three constants, namely, N, X, and о. The procedure 
results in the three restrictions, 


Bh = Хр, БУРУО = >), ХХ? = А". 
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TABLE 8.8 
x? TEST OF GOODNESS OF FIT OF NORMAL DISTRIBUTION 
(Data from table 5.2) 
ети 


fo fi fo — fe (fo — f) lfi 

1 .88 

216 vien 23 01 

3 3.91 

8 6.44 

12 10.35 

13 15.29 

17 19.13 

18 20.80 

18 19.57 

17 15.98 

16 11.08 

10 1.04 

2 3.77 

i} это] 56 3.56 1.93 
зом 138 138.01 Um 1.16 


ee 


The value of x? is 7.16, with n = 11 — 3 = 8. Referring to Table E 
we find that this value corresponds to a Р of about .50. In sampling 
from a normal population, under the given conditions, disagreement. 
between f, and f; as great as the observed would be expected about 
50 per cent of the time, due merely to sampling fluctuations. There 
is thus no reason to doubt the assumption of normality in the 
sampled population. It is important to note that the assumption 
is not proved; it is only shown to be tenable by the x? test. 

The chief limitation of x? in testing goodness of fit is due to its 
failure to regard signs of discrepancies. Inspection of the discrep- 
ancies of Table 8.8 indicates that the f, are less than the /; in the 
middle classes of the distribution and for the most part greater on 
either side. This suggests that the population form may be some- 
what flatter than the normal, although the x? test fails to reveal it. 
Unless the signs of the discrepancies tend to be unpatterned, the 
x? test of goodness of fit is not appropriate. When the discrepancies 
in several consecutive classes of the distribution are alike in sign, the 
assumption of normality is better tested by use of thealpha statistics. 


Е 
у 
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In educational research, the theoretical distribution whose fit is 
tested usually is the'normal. In all such cases, the number of degrees 
of freedom is the number of discrepancies minus 3, as noted above. 
Occasionally it may be desired to test the fit of a binomial dis- 
tribution to a distribution observed in sampling attributes. When 
this is done, the number of degrees of freedom is equal to the number 
of discrepancies minus the number of independent constants, 
determined from the data, needed to construct the theoretical 
binomial distribution. (See exr. 69.) 

Use of x? in Making Inferences from Small Sample Vari- 
ances. The x? distribution is by no means limited to frequency data. 
It has several important uses in making inferences from small 
sample variances. 

If samples of size N are randomly drawn from a normal population 
of variance 2°, the quantity No?/é? has a x? distribution with N-1 
degrees of freedom, i.e., 


х = = (8.42) 


with п = N — 1. 

Аз an application of this theory, consider the following problem. 
The variance c? of the scores in a sample of 18 is 100. Does this 
sample contradict the notion that the population variance is 225? 

The hypothesis to be tested is that ¢? = 225, and the appropriate 
sampling distribution is the x? with n = 18 — 1 or 17. Consulting 
Table E we find the critical points bounding the two-sided .05 
region of rejection to be 7.56 and 30.2. (See Fig. 8.18.) 

By formula (8.42) we have х? = [18(100)]/225 — 8.00. Since 


Region of acceptance 


0.025 


7.56 Value of X? 30.2 


E 


Fig. 8.18. Two-sided .05 region of rejection in the 
х? distribution for n = 17. 


486 Statistics т Education 


this value does not fall in the .05 region, the hypothesis ¢? = 225 
cannot be rejected at the 5 per cent level. 

In this problem, one might wish to test the hypothesis that 4? is 
as great as 225 or, in symbols, Н:4% > 225. If this hypothesis is 
false, it must be because 4? is less than 225; hence, the appropriate 
.05 region of rejection is that at the left of the point 8.67. Since 
the computed value 8.00 falls in this region, the hypothesis can 
be rejected at the 5 per cent level. 

Formula (8.42) may be used to estimate confidence limits of 42. 
For example, to estimate the 95 per cent limits in the above prob- 
lem where n = 17, the values 30.2 and 7.56 are substituted for x?. 
This results in the equations 


30.2 = 18100), 
x 

756 - 19100: 
р 


Solving these Гог é? we obtain 59.6 and 238 as the limits desired. 

The 95 per cent limits of ¢ are, of course, 4/59.6 and 4/238 ог 
1.1 and 15.4. It is instructive to compare these with the limits we 
would obtain through use of о, and the normal distribution. Since 
the value of о, is 10.0/4/2(18) or 1.67, we would obtain 10.0 + 
1.96(1.67) or about 6.7 and 13.3 as the 95 per cent confidence 
limits. The interval between these limits is about 15 per cent less 
than the correct interval. When Л is greater than about 30, how- 
ever, the two methods of estimating ¢ usually give quite similar 
results. The advantage of the x? method is that it is applicable to 
samples of any size. When п is greater than 30, the quantity м? — 
уп — 1 may be treated as a critical ratio, as noted at the foot of 
Table E. 

Before leaving this topic, it should be remarked that estimates 
of variances and standard deviations from small samples are notably 
lacking in precision. 

The x? Test of the Homogeneity of Three or More Estimates 
of Variance. One of the commonest questions in statistical analysis 
is whether two or more independent sample variances differ suffi- 
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ciently to discredit the assumption that the samples are from 
equally variable populations. Both the standard error of estimate 
су: and the standard error of measurement c, concepts explicitly 
rest upon the assumption of homoscedasticity or equal scatter of 
certain arrays or sets of scores. Correlation analysis, in general, 
loses meaning if the scores in the several arrays of the table show 
dissimilar variation. The 4 test of differences between two means 
assumes that the sampled populations are equally variable. In a 
later section we shall find that the F test of differences among three 
or more means rests upon the same assumption. Moreover, it fre- 
quently is true that significant differences among sample variances 
are important and informative in their own right. Experimental 
treatments, whether or not they produce significant differences 
among means of groups, may achieve importance if they produce 
significant differences among variances. The question of homo- 
geneity of variances is one of the most persistent and important 
questions in research. 

The significance of differences between variances in two samples 
may best be tested by the Е test. This will be discussed in the next 
section, At this time we shall consider a x? test of the significance 
of the differences among variances in three or more samples. 

Bartlett (Ref. 2) has shown that if k independent samples are 
drawn from the same population or from populations with equal 
variance, the quantity 


‚ _ 2.3026 
НЕС 


[(Zn) (log 5°) — Zn log $°] (8.43) | 


is distributed approximately* as x? with k — 1 degrees of freedom. 
The quantity C is equal to 1 + [1/@k — 3))[=(1/n) — 1/()]. 
The meaning of the other symbols and the application of the test 
will be brought out in the f ollowing problem. 

А random sample of 24 junior high school pupils were classified 

* The approximation is not entirely satisfactory when the size of any of the 
samples is less than 5 and when the resulting x? is near a critical point. When 
this is the case, Hartley’s test (Biometrika, 33:296-304, 1946) gives more 
dependable results. 
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according to occupation of father, and the IQ's in the classes were: 


PROFESSIONAL BUSINESS SKILLED SEMISKILLED UNSKILLED 
120 118 118 112 108 
118 116 112 109 102 
116 110 110 107 101 
112 110 110 106 
106 108 106 
108 105 


Do the variance estimates provided by these 5 samples contradict 
the assumption that the sampled populations are alike in variance? 

The sum of squares Xx? in each of the 5 categories or samples are 
shown in the first column of Table 8.9, and the degrees of freedom 
of each are shown in the second column. The data in the remaining 
columns are explained by the captions. The computation of x? is 


TABLE 8.9 
x? TEST OF THE HOMOGENEITY OF FIVE VARIANCE ESTIMATES 


nn 


DEGREES OF 1 УАВТАМСЕ 
Хг? FREEDOM № п ESTIMATE $° тос s* n LOG s? 
35.0 3 .333 11.7 1.0682 3.2046 
96.0 4 .250 24.0 1.3802 5.5208 
70.0 5 .200 14.0 1.1461 5.7305 
33.5 5 ‚200 6.7 0.8261 4.1305 
28.7 2 .500 14.4 1.1584 2.3168 
sum 263.2 19 1.183 20.9032 


ых 


Computation of x* from (8.43): 


263.2 
= 262. 213.5; logs? = 1.1414 
1 1 
Ела ШЕ = 5 | = 1109 
2.3026 
СГ” oe ят 2 = 
ха = Egg 09.1414) — 20.9032] = 1.61 


—_ 


made at the foot of the table. Since there are 5 samples, this x? has 
4 degrees of freedom. Referring to Table E at n = 4, we find 
.90 > P > .15. The assumption or hypothesis of equality of popu- 
lation variances is not contradicted. ; 

In testing homogeneity of variances the interpretation of x? is 
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similar to that in testing the compatibility of frequency data. In 
the above problem a x? of 9.49 or more would have discredited the 
assumption of homogeneity of variance at the 5 per cent or better 
level. On the other hand, a x? of .711 or less would have indicated 
agreement among the variances better than that expected 95 per 
cent or more of the time, and thus would һауе suggested bias in 
sampling or mistakes in computation. 

The assumption of homoscedasticity underlying correlation 
analysis may be tested by the x? test embodied in (8.43). Let us 
apply the test to the scores in the columns of Table 8.10. (The 
test may be applied in the same manner to the rows.) 

We may think of the У scores as independent samples classified 


TABLE 8.10 
х? TEST OF HOMOSCEDASTICITY IN THE CORRELATION TABLE 
(Data from table 1, appendix B) 
ERNEUT LL eee 


ARITHMETIC FUNDAMENTALS X 
3- 8- 13- 18- 23- 28- 33- 38- 43- 48- 
^ 37- 1 3 1 
3 24- 1 5 3 3 3 
5 21- 1 1 и 1 1 
8 18- 2 7 9 10 4 
Ё 15- 4 16 10 5 1 
g 12- 1 8 8 19 10 1 
Ё o 3 2 15 14 12 2 
% 6 |1 1 3 6 13 12 6 
E 3 1 5 6 1 1 
<_0- 3 4 1) 1 4 2 
J |4 6 13 23 54 79 64 21 19 4 
zy |o 15 85 191 501 1,45 1,018 519 418 103 
зу: 2 69 697 1,043 5,835 15,853 18,226 10353 9,404 2,659 
жу: |07.0 31.5 141.2 356.9 1,186.8 2,029.9 2,033.4 316.7 288.0 6.8 


п 3 5 12 22 53 18 63 26 18 3 


according to the values of X. In all we have 10 samples. If the as- 
sumption of homoscedasticity is sound, the variance estimates ob- 
tained from these 10 samples should not differ beyond sampling 
tolerance. 

Assuming that the Y’s have the mid-values of their respective 
classes, we may compute the sum of squares by the formula Ху? = 
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1/NINZY? — (2Y). To illustrate, їп the second sample or column 
sy = 40) +44+7=15 and DY? = AQ)? + (4)? + (Т)! = 69. 
Hence, Dy? = 1/6[6(69) — (15) = 31.5. The various sums of 
squares and the degrees of freedom of each are shown at the foot 
of the table. It is left as an exercise for the student to arrange these 
in a table similar to Table 8.9, extend the table, and calculate x’. 
Tt will be found that x? is equal to about 19.6 with n = 9. Since 
for this value P is about .02, the assumption of homoscedasticity is 
discredited at the 2 per cent level. Variability in performance in the 
Arithmetic Problems Test is probably not the same at all levels of 
performance in the Fundamentals Test. 

When the assumption of homoscedasticity is not satisfied in the 
correlation table, over-all correlation analysis, if made at all, 
should be interpreted cautiously. When the data are unequally 
variable in the arrays of the table, an rz, loses meaning as a descrip- 
tive measure. Moreover, if it is used in regression equations, there 
is no way of estimating the reliability of predicted scores, since the 
standard error of estimate necessarily assumes homoscedasticity. 
When bivariate data, classified by intervals on one variable, are 
heterogeneous from class to class, the circumstance usually merits 
study in its own right. 

In using c, to interpret an obtained score, it is necessary to as- 
sume that the errors of measurement are equally variable over the 
obtained score scale. To test this assumption, the differences be- 
tween scores obtained from parallel forms or equivalent halves of 
an instrument may be classified by selected intervals on the scale 
and their homogeneity tested as in Table 8.9. 

Concluding Remarks. We have considered only the typical ap- 
plications of x? in educational research. Occasionally it may be de- 
sired to obtain from several sets of similar frequency data a single 
probability figure. If the data can be pooled to form a single table, 
a x? test can be applied in the usual manner. It may be, however, 
that pooling is impracticable or inadvisable. In this case a single 
probability figure can be obtained by utilizing the additive property 
of x2. This remarkable property may be stated: 

The sum of two or more independent x? values is itself distributed as 


x? with degrees of freedom equal to the sum of the degrees of freedom 
of the respective x? values. 
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Thus, if we have obtained from three sets of similar data the x? 
values, 3.00, 2.50, and 3.50, each having 1 degree of freedom, the 
sum 9.00 may be interpreted as a single х? with 3 degrees of freedom. 
It will be noted that, although none of the three original values is 
significant, the aggregate is significant at about the 2.5 per cent 
level. In other words, it is improbable that chance would result in 
x?'s totaling 9.00 in the three situations, considered jointly. 

It should be emphasized that such combinations of x?'s are per- 
missible only if the separate values are independent, i.e., obtained 
from different samples. The Yates correction for continuity should 
not be used when х?'з are to be combined. 

Fisher (Ref. 8, p. 100) describes an ingenious method based upon 
x? of combining probability figures from independent tests of sig- 
nificance, whether or not these figures were originally obtained from 
x? tests. 

The x? test has wide application. It is essentially simple and easily 
applied. When the assumptions underlying it are satisfied, it is 
usually dependable. Lewis and Burke (Ref. 12) discuss use and 
misuse of the test at considerable length. This excellent. discussion 
should be read by anyone who plans to use the test in problems dif- 
fering from the typical ones considered above. (See also W. G. 
Cochran, “The x? Test of Goodness of Fit," Ann. Май. Slal., 
23 :315-345, 1952.) 


Exercises 


66. Show how the x? test can be applied to the data of exercises 28, 31, 39, 
and 40 of this chapter. 

67. Show how the x? test can be applied to the data of exrs. 32, 33, 34, 
36, and 37 of Chapter VI. What information would the test convey in 
each? Tn exr. 36, why is it not permissible to utilize the additive prop- 
erty of x? and thereby obtain a single probability figure? 

68. Fit a normal curve to one or more of the distributions of exr. 21, p. 63, 
and test goodness of fit. (Cf. exr. 10, p. 203.) Discuss the results. 

69. Apply the goodness-of-fit test to the data of Table 8.1, after combining 
the frequencies in the upper three and the lower three classes. (Note: 
Since in this case the theoretical frequencies can be inferred from the 
size of the sample, only 1 degree of freedom is lost. Had it been neces- 
sary to estimate р from the data, two degrees of freedom would have 
been lost.) 
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70. 


71. 


12. 


73. 


74. 


15. 


16. 


77. 
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(From McCormick) In random samples of families from urban and 
rural areas, the distributions by size were: 


MEMBERS URBAN RURAL 
IN FAMILY FAMILIES FAMILIES 
1 140 34 
2 436 121 
3 384 119 
4 315 110 
5 202 88 
6 118 66 
7 66 47 
8 37 32 
9 29 20 
10 or more 19 24 


Can we believe that these families are from populations alike in family 
size) 

In testing the goodness of fit of a normal curve to a distribution of 
600 scores, a x? of 18.50 with 12 degrees of freedom resulted. However, 
the values of œ and a, of the distribution were .26 and 2.45 respectively. 
What can account for the inconsistency? Is the assumption of normality 
in the population sound? 

In a random sample of 15, the variance о? is 50. Is the hypothesis 
reasonable that the population variance @? is 100? What are the 95 
per cent confidence limits of 029 

Іп School B, Table I, Appendix В, the 18 IQ's yield a sum of squares of 
1,807; in School С, the 38 a sum of 7,587; in School С, the 32 а sum of 
8,151; and in School J, the 29 a sum of 2,729. Is it probable that the 
represented populations are equally variable in respect to 1Q? 

А test was given to 75 individuals and scored on equivalent halves. 
The sum of squares of the differences between half-test scores of the 
top 25 individuals was 800; of the middle 25, 600; and of the lowest 25, 
1,200. Is the idea that the errors of measurement are equally variable 
throughout the range consistent with this evidence? 

In Table I, Appendix B, Schools A, B, C, and D represent a social 
environment different from the other schools. How would you test the 
hypothesis that the distributions of the IQ's are the same in the two 
environments? 

How would you proceed to test the homoscedasticity of the X's in 
Table 6.62 

Ш a classical study of probability, a coin was tossed 4,092 times, and 
2,048 heads were observed. Comment. 
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78. An experimenter reported that a goodness-of-fit test of the normal 
curve to a sample of data resulted in a x? of 2.48 with 11 degrees of 
freedom. In what way does this report raise doubt? 

79, Find or inyent research problems to illustrate each of the three general 
applications of the x? test to frequency data, giving special attention 
to the assumptions involved in the test. 


The F Sampling Distribution 


When we have two large independent samples with standard 
deviations о! and о», respectively, we may test the hypothesis that 
the sampled populations are equally variable, i.e., that бу — 6» = 0, 
by dividing the difference тз — өз by its standard error ~/o?, + 02, 
and referring the ratio to the normal probability scale. When the 
samples are small, however, such procedure leads to substantial 
error, since the sampling distribution of the difference т — сг de- 
parts markedly from normality when either sample is small. 

The most widely useful test of the significance of a difference in 
variability between two samples is based upon the sampling dis- 
tribution of the variance ratio, commonly known as the F ratio. 
The F distribution is the most general of the various sampling dis- 
tributions. It is applicable to samples of any size and to many sorts 
of problems. 

The Distribution of the F Ratio. Let us first define the F ratio. 
Suppose that we take a random sample of size №, from a normal 
population with variance 6° and compute the sum of squares Daj. 
If we divide Zz? by № — 1 we shall have sj, which can be shown to 
be an unbiased estimate of 2°. Now suppose that we independently 
take a second sample of size № from the same population or from 
а second population with the same variance 42% If we divide the 
sum of squares Xz? in this sample by № — 1 we shall have вава 
second unbiased estimate of 2°. The variance ratio F is defined 


жү(А-1) _ s 


F-SAQNCCY) 4 


(8.44) 
In other words, F is the ratio of two unbiased, independent estimates 
of a population variance. 

_The sampling distribution of the F ratio is mathematically com- 
plex. Its shape depends upon nı and пз, the numbers of degrees of 
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freedom in numerator and denominator, respectively. Every change 
in n, or n; changes the shape of the F distribution. The distribution 
is, however, independent of the variance of the sampled population 
or populations and of other parameters. It depends solely upon 
n; and ns. Ў 

In an experimental study of the F distribution a sample of 5 and 
a sample of 10 were randomly drawn from a normal population, 
and the variance estimate s? = 212/4 was divided by the variance 
estimate 82 = 213/9. This was repeated 120 times. The F ratios 
thus obtained are shown in Table 8.11 and in Fig. 8.19. It will be 
noted that the experimental distribution is in fair agreement with 


TABLE 8.11 
DISTRIBUTION OF 120 Fis 
RATIOS IN RANDOM SAMPLES 
OF 5 AND 10 FROM A 
NORMAL POPULATION 


Е RATIO Í 
ЕЕ 
6.0-6.4 1 
5.5-5.9 0 
5.0-5.4 2 
4.5-4.9 1 
4.0-4.4 3 
3.5-3.9 0 
3.0-3.4 3 
2.5-2.9 6 
2.0-2.4 4 
15521:9 14 
1.0-1.4 17 
0.5-0.9 39 
0.0-0.4 30 


-—————— 


the F curve shown in the figure. This curve represents the theoretical 
distribution of the F ratio having 4 degrees of freedom in the 
numerator and 9 in the denominator. We may conveniently identify 
the ratio by the symbol Fi. 

In the above experiment, if the variance estimates in the samples 
of 10 had been divided by the estimates in the samples of 5, a dis- 
tribution of F ratios having 9 degrees of freedom in the numerator 
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and 4 in the denominator would have been obtained. These F;,4 
ratios are of course the reciprocals of the F4, ratios. The theoretical 
distributions of F4, and Fs, are shown in Fig. 8.20. The curves are 
different, but owing to the reciprocal relationship of the respective 
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Fig. 8.19. Histogram of the distribution of 120 Е,» 
ratios (Table 8.11) and curve of theoretical Ё. 
distribution. 
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Fig. 8.20. Curves of the theoretical distributions 
of the Fas (>) and the Fs., (- —) ratios. 


ratios, the percentage of area of the Е.» distribution lying to the 
left of a given value of F less than one is equal to the percentage 
of area in the Ёз distribution lying to the right of 1/F, and vice 
versa. For example, the percentage of area in the former lying to 
the left of F = .25 is equal to the percentage of area in the latter 
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lying to the right of 1/.25 or 4.00. Because of this reciprocal rela- 
tionship, only the areas in the right-hand tails corresponding to F 
values greater than one need to be tabulated. 

Table Е, Appendix Ш, is a typical table of area relationships 
under the F curves for selected combinations of nı and пу. In illus- 
tration of the use of the table, to find the values of F to the right 
of which specified percentages of the total area of the F 4,9 dis- 
tribution lies we enter the table at the column headed 4 and go 
down that column to row 9. We note that 10 per cent of the area 
lies to the right of the point 2.69, 5 per cent to the right of 3.63, 
2.5 per cent to the right of 4.72, and so on. For the Fs; distribution 
we go to column 9 and row 4 and note that 10 per cent of the area 
lies to the right of the point 3.93, 5 per cent to the right of 6.00, 
2.5 per cent to the right of 8.90, and so on. These points are com- 
monly known as crilical poinls. 

The two-sided .05 or 5 per cent region of rejection in the F4» 
distribution is determined by the points 4.72 and 1/8.90 or .11, 
put in the Fo, distribution the region is determined by the points 
8.90 and 1/4.72 or .21. 

The right one-sided 5 per cent region of rejection of the F4, dis- 
tribution is determined by the point 3.63; the left one-sided region 
by the point 1/6.00 or 247. In the F;,, distribution these points are 
at 6.00 and 1/3.63 or .28, respectively. 

The regions corresponding to other combinations of n; and л» 
are similarly found from 'Table F. (See exr. 80.) The regions for 
combinations of n; and ns not in the table may be roughly deter- 
mined by simple linear interpolation. In practice, it is usually the 
case that the nearest tabled points are adequate. It is to be remem- 
bered that the left-hand critical points of the Ёп distribulion are the 
reciprocals of the right-hand crilical points of the Fam distribution. 

Significance of Differences between Independent Sample 
Variances. The question of whether two independent sample esti- 
mates of variance are sufficiently unlike to discredit the hypothesis 
that the samples are drawn from equally variable populations is 
important both because several statistical techniques, notably the 
t test of difference between means, assume equality of variance 
and because real differences in variability are themselves usually 
informative and important. 
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The significance of differences between independent sample 
variances is readily tested by the variance ratio F. To illustrate 
the test, usually called the Ё test, let us return to the data of the 
example on page 464. For those data, n; = 4 and п» = 5; hence, 
the F distribution of concern is Ёз. To test the hypothesis that 
é? = 02, le, that the sampled populations have equal variances, 
the two-sided region of rejection is appropriate, since it safeguards 
equally well against the alternatives 6? < ó? ог дї > 03, one of which 
must obtain if the hypothesis is in fact false. Hence, we first locate 
the upper and lower limits or critical points in the №5 distribution 
corresponding to the level of significance we select. If we select the 
5 per cent level, the upper point is 7.39 and the lower is 1/9.36 ог 
ЛІ, Since in our example Ezj = 102.8, with n; = 4, and Ya} = 35.3, 
with na = 5, we have, following formula (8.44), 


1028/4 _ 25.7 _ 


= 35375 ^ 7.06 5:6 


This value of F is well within the region of acceptance, and the 
hypothesis that the sampled populations have equal variances is 
tenable; in fact, the probability figure for the hypothesis is about 
20. (Why?) 

The F test of homogeneity of variances should routinely be made 
before application of the / test of difference between means of two 
independent samples. If the assumption is untenable that the 
sampled populations have equal variances, the / test cannot logically 
be made or clearly interpreted. 

Before leaving this topic we should note that, if the hypothesis 
to be tested is that one variance estimate is significantly larger than 
the other, the one-sided region of rejection would be appropriate. 

Analysis of Variance. In addition to testing the homogeneity 
of two variance estimates, discussed above, the F distribution has a 
great many applications. The distribution is central in several 
important kinds of analyses of statistical data conventionally sub- 
sumed under the topic, “analysis of variance." 

'The ideas underlying analysis of variance are that (1) the total 
sum of squares of a set of classified data can be separated into com- 
ponents which may be attributed to two or more factors or sources, 
and that (2) if the data are randomly drawn from a normal popu- 
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lation or populations with variance 42, the various components, 
divided by their respective numbers of degrees of freedom, are un- 
biased, independent estimates of the variance 6°. If comparisons 
of the estimates, by means of the F ratio, indicate differences too 
great to be reasonably explained by chance, it follows that there is 
present in the data as classified a factor or factors whose effect is 
significant. 

The general proof of these ideas is too complicated to be con- 
sidered here. The analysis of the variance of a given set of data 
and the reasoning behind it, however, are not difficult. The simplest 
problem is that of testing the significance of differences among 
several means. 

Significance of Differences among Three or More Means. 
The 10 data from page 488 are shown in Table 8.12, the constant 


TABLE 8.12 


IQ's (—100) OF 24 JUNIOR HIGH SCHOOL PUPILS CLASSIFIED 
ACCORDING TO OCCUPATION OF FATHER 


eee 


PROFESSIONAL | BUSINESS SKILLED SEMISKILLED UNSKILLED 
20 18 18 12 8 
18 16 12 9 2 
16 10 10 7 1 
12 10 10 6 
6 8 6 
8 5 
зом 66 60 66 45 11 
MEAN 16.5 12.0 11.0 1.5 8:7 


GRAND TOTAL, 248; COMMON OR TOTAL MEAN, МЬ 10.3 
SS ee 
100 having been subtracted from each 10 to facilitate calculations. 

In analyzing such data the question of interest usually is whether 
the means of the columns (samples) are sufficiently different to 
discredit the hypothesis that the samples are from populations 
having equal means. If the hypothesis is true, the variance estimate 
based upon the variation of the means of the columns will not be 
significantly greater than the variance estimate based upon the 
variation of the scores within columns. Let us call the former 
variance estimate s and the latter 5%. 
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The estimate s? is equal to the weighted sum of squares of the 
column means divided by the number of degrees of freedom. This 
number is one less than the number of column means, since one 
sample statistic Мь the mean of all of the scores, is used in calcu- 
lating s. (See р. 461.) 

For the illustrative data of Table 8.12, we have 


4(16.5 — 10.3)? + 5(12.0 — 10.3)? + 6(11.0 — 10.3)? 
3 46.5 — 10.3)? + 33.7 — 10.3)? 
2 = T = 87.2. 


The estimate s? is equal to the pooled sum of the squares of devi- 
ations from the column means divided by the number of degrees 
of freedom. This number is equal to the total number of scores 
minus the number of column means, since the column means are 
determined from the data. 

For the illustrative data, 


35.0 + 96.0 + 70.0 + 33.5 + 28.7 
EP 34 — 5 E:113:97 


since in the first column the sum of squares from the column mean 
is (20 — 16.5)? + (18 — 16.5)? + (16 = 16.5)? + (12 - 16.5)? = 
35.0; in the second (18 — 12.0)? + (16 — 12.0)? + (10 — 12.0)? + 
(10.0 — 12.0)? + (6 — 12.0): = 96.0; and so on. 

It is convenient to organize our computations in the following 


tabular form: 


SOURCE OF SUM OF DEGREES OF VARIANCE 
VARIATION SQUARES FREEDOM ESTIMATE 
Among groups 348.9 5—1 87.2 
(column means) 
Within groups 263.2 24—5 13.9 
(within columns) 
TOTAL 612.1 24-1 


Tt will be noted that the total of the two sums of squares is 612.1. 
This total will agree, except for effects of rounding, with that 
obtained directly by summing the squares of the deviations of all 
of the scores from their common mean 10.3. This illustrates the 
fact that the total sum of squares is divisible into two components, 
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one resulting from the yariation of the means of the columns, the 
other from the variation of the scores within columns. 

After obtaining the variance estimates s? and 5*, we are ready 
to test the hypothesis that there is no significant difference between 
the column means. If the hypothesis is true and if the assumptions 
of randomness of sampling and population normality are satisfied, 
the ratio 52/5? is distributed as F, with degrees of freedom equal 
to those of the estimates. For the data in hand we have 


Faas = чае = 627. 


Consulting Table Е we find that this value falls well beyond the 
5 per cent point; in fact, Р < .005. The hypothesis is strongly dis- 
credited. It is highly probable that there are real differences in 
mean IQ's in the populations represented by the samples. 

In tests of this sort, the variance estimate 52 derived from the 
variation of the means is always placed in the numerator of the F 
ratio and only the right-hand region of rejection is considered. 
This is because we wish to reject the hypothesis only if the means 
differ more than we would expect in sampling from populations 
alike in mean value. In other words, we wish to reject the hypothesis 
only if s; is significantly greater than s2. Hence, the right-hand 
region of rejection is appropriate. 

In practice, it usually is easier and more accurate to calculate 
the variance estimates from raw rather than deviation scores. 
When the N scores are classified іп А columns, as in Table 8.12, the 
variance estimates may be obtained by 


a. Summing the squares of the А scores. For the scores of Table 
8.12 this gives 


(20)? + (18)? + (16)? + - - - + (8)? + (2)? + (1)? = 3,176. 
b. Squaring the grand total of the scores and dividing by N. Thus, 
(248? _ 
ис 2,562.7. 


с. Squaring the sum of scores in each column, dividing each 
squared sum by the number of scores in its column, and summing 
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the quotients. Thus, 


(66)? (60)? 


66)? (абу. Q1? 
4 5 Ед 


( 
zi 6 АЕ 6 2,912.8. 


4. Subtracting (b) from (с) and dividing by k — 1. The quotient 
is sj, i.e., 
s _ 2912.8 — 2,562.7 _ 


sè БУКЕТ 87.5. 


e. Subtracting (c) from (a) and dividing by N — k. The quotient 
is 82, i.e., 
, _ 3,176 — 29128 _ 


i-a 5 TB. 


The deviation and raw score methods of computing the variance 
estimates usually give somewhat different results, owing to the 
effects of rounding. The results obtained by the raw score method 
are of course more accurate. 

The general variance ratio for testing the significance of differ- 
ences among means of three or more samples, classified as in Table 
8.12, is 


% 
ote 


Fia = (8.45) 


EX 


Tn order to show mathematically that the ratio of (8.45) is dis- 
tributed as F, it is necessary to assume that the sampled populations 
are normal with equal variances. The latter assumption can be 
tested by the x? test of homogeneity embodied in (8.43). It will be 
recalled that the data of Table 8.12 were used to illustrate that test. 
The assumption of normality cannot be satisfactorily tested when 
the samples are small, but must be judged in light of what is known 
a priori or what can be deduced from the nature of the data. There 
is considerable empirical evidence to show that the ratio of (8.45) 
follows the F distribution closely, despite moderate violations of 
the assumptions of normality and homogeneity of variance. 

The F test of (8.45) is widely useful. It may be thought ofasa 
generalized / test; in fact, in the special case when k = 2, the F test 
and 1 test yield identical results. (See exr. 87.) Whenever data can 
be classified according to some relevant scheme, as in Table 8.12, the 
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test indicates whether there are significant differences among the 
means of the classes. 

The Significance of a Correlation Ratio. In a correlation 
table the Y scores are classified in columns according to selected 
intervals of the paired X scores. As pointed out in Chapter VI, if 
there is relationship between X and Y, the column means tend to 
fall into some sort of pattern, linear or otherwise, and to vary more 
among themselves than the scores within columns. (See Fig. 6.10.) 
The strength of the relationship is measured by the correlation ratio. 

The correlation ratio of Y on X, as defined in (6.37), is 


2 
2 s — 9%, 
ут 2 

бу 


where c2, is the pooled variance of the scores about their respective 
column means in the correlation table and c; is the variance of the 
scores about the common mean Y. If we multiply both members 
of the fraction at the right by № we will have in the numerator the 
sum of squares of deviations of the scores within columns about 
their respective means and in the denominator the total sum of 
squares of the deviations of the scores about the common mean ў. 
Let us designate the within-column sum of squares Ху» and the 
total sum of squares Ху). In this notation, 72, may be expressed 


Dye | Хур — Ya. 
Xy Dy? 


= 


Now the difference Zy? — Zy is equal to the weighted sum of 
squares of the deviations of the column means from the common 
mean У. Calling this sum of squares Zyl, we may write 

ху? 
Е Аа 
mid 
By a bit of algebraic manipulation of these two expressions for 12, 
we obtain 
Худ es. 
Хур 1-т, * 

As we һауе seen, Zy, divided by k — 1 (k being the number of 

columns in the correlation table) provides one estimate of $? and 


y 


ху? divided by N — k provides a second. If the assumptions of 
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normality and homoscedasticity are satisfied, the ratio of the two 
estimates is distributed as F. It follows that 


E 20) 
Рала = (“^з ОГИ quim 


has an F distribution. 

If the numerator of (8.46) is significantly larger than the de- 
nominator, the means of the columns vary significantly more than 
the scores within columns. When this is the case, real relationship 
is demonstrated, i.e., the correlation ratio ў„ is greater than zero. 

То illustrate the F test of the significance of the correlation ratio, 
let us return to the data of Table 6.16. For those data, пу, = .673, 
k — 10, and N — 54. Substituting in (8.46) we obtain 


-673/9 _ 
321/44 — 


Consulting Table Е we find P < .005. The correlation ratio туғ is 
highly significant. 

Formula (8.46) may be readily adapted to test the significance 
of the correlation ratio nzy- 

As a rule, correlation analysis may well begin with a test of 
the significance of the correlation ratios. If either of these is sig- 
nificant, real relationship between the variables X and Y is demon- 
strated. Whether the relationship is linear, however, must be deter- 
mined by supplementary analysis. 

Prior to the discovery of the F test, the test of the significance 
of the correlation ratio was based upon the standard error oy = 
(1 — 7)2/\/N and the normal probability scale. This test is inade- 
quate, both because the standard error is inexact and because the 
sampling distribution of the correlation ratio is not normal, even 
for large samples. However, the old procedure is occasionally use- 
ful in estimating rough confidence limits of 4. 

Testing Linearity of Regression. The fundamental assumption 
underlying correlation analysis is that of linearity of regression. To 
test the soundness of the assumption, we need only to compare the 
variance estimate based upon the squares of the deviations of the 
column means from the fitted straight line of regression with that 
based upon the squares of deviations of the scores within columns 


Fs = 10.1. 


MO. =. - > 
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about their respective means. If the former is significantly greater 
than the latter, the assumption of linearity is unsound. 

The breakdown of the total sum of squares about the common 
mean Y preliminary to calculating the variance estimates of con- 
cern is considerably more complicated than the breakdown pre- 
liminary to testing the correlation ratio, and we shall not consider 
it here, А comprehensive discussion may be found in Ref. 18. 

The breakdown results in sums of squares which are proportiona] 
to nès — r5, and 1 — ni, во that the variance ratio is equivalent to 


(ni — r2,)/(k — 2) 
Рак = ( —3/N- b (8.47) 
where k is the number of columns in the correlation table and 
the other symbols have their usual meaning. 
Let us apply (8.47) in testing the linearity of regression of У 
on X for the data of Table 6.16. For those data, лу, = -673, n= 
593, k = 10, and N = 54, so that 


(.613 — .593)/8 _ 135. 


Pru = "p .613)/44 


This value is not significant, and we conclude that the assumption 
of linearity of Y оп Х is tenable. 

It is left as an exercise for the student to show that the assump- 
tion of linearity of X on Y in Table 6.16 likewise is tenable, Formula 
(8.47) may be readily adapted to testing this assumption. 

As a rule, the soundness of the assumption of linearity сап, for 
practical purposes, be satisfactorily judged by inspection of the 
correlation table. If there is doubt, however, or if careful and exact 
analysis is needed, the assumption should be tested by the above 
test. It cannot be overemphasized that linearity of regression is a 
necessary condition of meaningful correlation analysis. 

Significance of the Multiple Correlation Coefficient. The 
standard error of a multiple coefficient of correlation is арргохі- 
mately 

1 — А? 
ок = — VE 


The use of this standard error is greatly limited, however, because 
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the sampling distribution of А is not normal, even for large samples. 
It is occasionally of value in estimating rough confidence limits of 
R, but is not appropriate in tests of significance. 

The variance ratio technique permits an exact test of the hypothe- 
sis that Ê = 0. It will be recalled that А2? indicates the amount of 
variance of the dependent variable X, which is accounted for by 
the correlation of X, with the independent variables and that 
2424 — R?) indicates the variance of the residuals about the plane 
of regression, i.e., the residual or error variance. The sum of squares 
for the former is NR%cj, with m — 1 degrees of freedom, m being 
the total number of variables correlated. The sum of squares for 
the latter is №1(1 — R?) with N — т degrees of freedom. These 
sums of squares divided by their respective numbers of degrees of 
freedom yield independent, unbiased estimates of the population 
variance ду. The ratio of the estimates is 


Е eee NR nS uil 2 RE mc e 
2M "INS — АЈА – т) (1- RYN = т) 


If the F ratio of (8.48) is significant, the variance estimate based 
upon the sum of squares ascribable to correlation is significantly 
greater than the estimate based upon the residual or error sum of 
squares, When this is the case, R is significant, i.e., the hypothesis 
that R = 0 is discredited. 

Applying the test to the data of Table 6.13, where А? = .407, 
М = 23, and т = 3, we have 


(8.48) 


.407/2 
594/20 ^ 
In Table F we read .01 > P > .005. The hypothesis that R = 0 is 
strongly discredited. 

It is interesting to note that, in the special case when m = 2, 
formula (8.48) is equal to the square of (8.37). This of course 
follows from the facts that (1) when m = 2, R and г», are identical, 
and (2) an F ratio with one degree of freedom in the numerator 
and n degrees іп the denominator is equal to the square of the / 
ratio having n degrees of freedom. Р 

It frequently is desirable to know whether a regression equation 
containirig m; independent variables has significantly greater pre- 


Ёз = 


6.85. 
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dictive value than one containing a fewer number ms, say, of inde- 
pendent variables chosen from the original m; variables, i.e., whether 
the longer equation explains a significantly greater amount of the 
variance of the dependent variable than the shorter. In connection 
with the dental school data of p. 307, for example, we might ask 
whether the equation involving predental grades, scholastic apti- 
tude, and mechanical aptitude predicts dental school grades signifi- 
cantly better than one involving only predental grades and mechan- 
ical aptitude. 

'This kind of question may readily be answered by the variance 
ratio technique. If А; and В» are respectively the multiple corre- 
lation coefficients of the dependent variable with the m; and m» 
independent variables and N the number of cases, the ratio 


(R? — Ri)/(mi — т) 
TORY ON = my — 1) 


F 


is distributed as F with degrees of freedom ny = mi; — т» and 
па = М-т, — 1. 

Let us return to the question above concerning the dental school 
data, From previous calculations we have В? = .526 with m; = 3. 
Application of formulas (6.29) and (6.32) gives R$ = .502. with 
т» = 2, i.e., the square of the coefficient of correlation of dental 
school grades with predental grades and mechanical aptitude is 
.502. Since № = 146, we have the ratio 


_ (526 — .502)/(8 — 2) 
(L—.526)/046 —3 — 1) - 


with 1 and 142 degrees of freedom. The corresponding P is about 
.01. We can conclude that the equation containing predental grades, 
scholastic aptitude, and mechanical aptitude is better than the one 
containing the first and third. However, the gain in using the three, 
although significant, is slight. This is seen when the corresponding 
standard errors of estimate are compared. 

Significance of Differences among Means in Double Classi- 
fication. As a concrete approach to the analysis of data classified 
according to some relevant scheme in r rows and c columns, con- 
sider the data of Table 8.13. 

These data were obtained in an investigation of a teacher rating 


F 7219; 
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TABLE 8.13 


SCORES OF 7 TEACHERS AS RATED BY 8 SUPERVISORS ON A 10 
POINT SCALE 


------------------------------------- 


SUPERVISOR SUM OF MEAN OF DEVIATION 
TEACHER A B с р Е Е н ROW now М; Mr — М. 
a 3 4 T 1, 3 2 6 6 38 4.8 — 8 
b 8 9 9 8 3 9 4 8 58 7.2 1.6 
с 6 4 9 6 5 2 7 3 42 5.2 — 4 
а 6 9 d 5 1 9 7 7 51 6.4 8 
е 5 3 6 3 1 8 4 4 34 4.2 -14 
f 4 5 5 4 6 8 6 3 41 5.1 —_ 6 
[4 6 9 T 3 5 8 5 4 47 5.9 3 
SUM OF 38 43 50 36 24 46 39 35 311 
COLUMN 31 
MEAN OF 54 61 71 51 34 6.6 5.6 5.0|тотль MEAN, Mı = ^56 7 5.6 
COLUMN 
DEVIATION -2 5 15 —5—22 10 00-6 
Me — м 


scale which was claimed to be objective and reliable. The scale 
contained a check list of observable characteristics of the "good 
teacher" each of which presumably could be observed as “present” 
or “absent” in rating a particular teacher. Possible scores ranged 
from 0 to 10. Using the scale, eight student supervisors in a class 
in educational administration and supervision rated seven teachers 
who were believed to vary considerably in teaching efficiency. The 
ratings are shown in Table 8.13, classified according to supervisor 
and teacher. 

In analyzing the ratings the questions which arise are: (1) If 
allowance is made for differences among supervisors, are there 
significant differences among teachers, as rated? (2) If allowance 
is made for differences among teachers, do the supervisors differ 
significantly in assigning ratings? 

To study these questions we assume that the rc ratings are inde- 
pendently observed values of the variable measured by the scale 
and that the variable is distributed normally with variance ó?. If 
the assumptions are satisfied and if the classification of the ratings 
in rows and columns is random, i.e., free from teacher and super- 
visor effects, three estimates of the population variance ó? are 
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available. One of these, call it s?, is provided by the variation of the 
means M, of the rows, the mean ratings of the seven teachers. А 
second estimate s? is provided by the variation of the means М. 
of the columns, the means of the ratings assigned by the eight 
supervisors. The third estimate s; is provided by the variation 
remaining among the ratings after that due to the means of rows 
and the means of columns has been removed. 

To calculate s? the deviations of the means М, of the rows from 
the total mean М, are squared and weighted by 8, since there are 
8 ratings in each row, and the sum is divided by 6, the number of 
degrees of freedom. This gives 


8(—.8)? + 8(1.6) + 8(—.4)? + 8(.8)° + 8(—1.4)? 
+ 8(—.5)* + 8(.3)? _ 50.4. 
6 


32 = 


6 


There аге 6 and not 7 degrees of freedom in this case, because the 
deviations of row means are measured from the total mean which 
must be determined from the data. As a consequence the deviations 
of the row means must sum to 0, and only 6 of the 7 are independent. 

The estimate s? is obtained from the deviations of the means Me 
of the columns from M, in like manner. For this estimate, we have 


1(—.2) + 1C5)* + 10.8)? + 7(—.5)? + 1(—2.2)* 


Ta + 7(1.0)? + 7(—.6)? _ 62.9. 
5s 1 Т 


In practice the remainder sum of squares leading to 52 usually is 
obtained by subtracting the sums of squares of row and column 
means from the total sum of squares. The total sum of squares or 
the sum of squares of the deviations of the 56 ratings from the 
total mean are 


(8 — 5.6)? + (4 — 5.6)? + (7 — 5.0? + --- + (8 — 5.6)? 
+ (5 = 5.6)? + (4 — 5.6)? = 276.0 


and the remainder sum of squares is 276.0 — 62.9 — 50.4 or 162.7. 
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This sum of squares has 42 degrees of freedom, so that 


GOs 
DEO 


There are 42 degrees of freedom here because only 42 of the 56 
observations contributing to s? are independent. This can be 
verified by reasoning similar to that followed in determining the 
degrees of freedom in the h X k-fold contingency table. 

It will be instructive to compute s? directly from the observations. 
The deviations of the 56 ratings from the total mean are accounted 
for partly by the deviations of their respective row and column 
means from the total mean and partly by chance discrepancy or 
error. Consider, for example, the rating 3 in the upper left-hand 
cell of the table. This rating deviates from the total mean by — 2.6. 
The row mean deviates by —.8 and the column mean by —.2 from 
the total mean. This leaves —1.6 as a chance discrepancy. The 
first rating in the second row deviates from the total mean by +2.4. 
Its row and column mean deviate from the total mean by 4-1.6 and 
—.2, respectively. This leaves a discrepancy of 1.0. The 56 dis- 
crepancies thus obtained are: 


A B с D E Е с H 
a —1.6 —1.3 7. 2-7 4 —3.8 1.2 1.8 
b 1.0 1.3 3 1.8: —2.0 .8 —3.2 1.4 
с ili ip 127 2.3 1.3 2.0 —4.2 1.8 —1:6 
dica ЖЫ 95:9. —8.2 1.6 .6 1.2 
е LOSER y 52110 PX 5 4 
fo == Ө 26 51.6, —-.6 3.1 1:9 „д. en lero 
g 3 2.6 — 4 —2.4 1.3 та = 99+ Ж 


Obviously if each rating differed from the total mean by an 
amount just equal to the sum of the deviations of its row and column 
means from the total mean, there would be no discrepancies, That 
there are discrepancies indicates that not all of the variation of the 
56 observations is accounted for by variation of means of rows and 
columns, These discrepancies presumably are due to chance or 
experimental error. For this reason the variance estimate s; derived 
from them is sometimes called error variance. 

When the 56 discrepancies are squared and summed, we obtain 
163.6 as the remainder or error sum of squares. Owing to the effects. 
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of rounding, the sum differs slightly from that obtained above by 


subtraction. 
If we assemble our various calculations in a table we shall have: 


SOURCE OF SUM OF DEGREES OF VARIANCE 

VARIATION SQUARES FREEDOM ESTIMATE 
Row means 50.4 6 8.40 
Column means 62.9 7 8.99 
Remainder or error 162.7 42 3.87 
TOTAL 276.0 55 


То test the hypothesis that there is no significant difference 
among the means of the rows, the mean ratings of the seven teachers, 
the variance ratio 


8.40 _ 


38:7 2.17 


Ев = 


is referred to Table Е. The ratio falls short of significance; in fact 
the corresponding Р is greater than .10. The hypothesis cannot be 
rejected. This rating procedure fails to bring out significant differ- 
ences among teachers. It may be that none exist, it may be that 
the rating scale is unreliable, it may be that the supervisors are 
poor observers. We can infer only that, as applied in this situ- 
ation, the power of the scale to discriminate among teachers is 
not demonstrated. 

To test the hypothesis that there are no significant differences 
among the means of the columns, between the average ratings 
assigned by supervisors, we have the variance ratio 


8.99 


387 = 2.32. 


Fris = 
This ratio, too, falls short of significance. 

We thus finally conclude that the ratings of Table 8.13 could 
easily have arisen in sampling from populations in which there is 
in fact no differences among teachers and по differences among 
supervisors in respect to rating teachers. The results of this applica- 
tion of the rating scale are without significance. 

In analyses like the above, it usually is easier to compute the 
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various sums of squares from raw rather than deviation scores. 
This may be done in a table involving г rows and с columns by 


a. Squaring each score, summing the squares, and then subtracting 
the quotient of the total sum squared divided by the total num- 
ber. This gives the total sum of squares. In Table 8.13, the 
total sum of squares is 


феи ++ port set an CIDE — 2158, 


b. Squaring the sum of the scores in each row, summing the squares, 
dividing by с, and then subtracting the quotient of the total 
sum squared divided by the total number. This gives the sum of 
squares of the row means. Thus, 


(38)? + (58)? + + ++ + (41)? + аты _ GID* _ gy 
8 БеЗ roe 


с. Squaring the sum of scores in each column, summing the squares, 
dividing by r, and then subtracting the quotient of the total 
sum squared divided by the total number. This gives the sum 
of squares of the column means. Thus, 


G8) + (43)? + + + + (89)? + (35)? _ (311)? _ 624 
7 ais cao 


d. Subtracting the sums of squares obtained in (b) and (c) from 
that in (a). This gives the remainder or error sum of squares. 
Thus, 

275.8 — 50.2 — 62.4 = 163.2. 


The deviation and raw score methods of computing the sums of 
squares usually give slightly different results, owing to effects of 
rounding. The results obtained by the raw score method are more 
exact. 

It is to be remembered that the variance estimate s? is obtained 
by dividing the sum of squares from (b) above by г — 1; s? by 
dividing the sum of squares from (c) by c — 1; and s? by dividing 
the sum of squares from (d) by (r = 1)(c — 1). 

The variance ratio for testing the significance of differences among 
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row means is 
Е (8.49) 
Fo—1 7-1) = g? 8: 
Я 


while that for testing the significance of differences among column 
means is 


8? 
Ел), е-е) = = (8.50) 
€ 

The variance ratio technique of analyzing data capable of double 
classification is powerful and of wide application. In an experiment 
on teaching methods involving several teachers, for example, an 
investigator might wish to classify the average scores of the pupils 
on an examination at the end of the experiment according to 
method and teacher. The question of first concern here would be 
whether there are significant differences among methods means, 
independent of teachers. 

As another example, in studying school transportation costs, it 
might be desired to classify average costs according to owership 
of buses as “private,” “district,” and “joint,” and according to 
school size, geographical region, or some other characteristic. 

As still another example, it might be desired to determine whether 
a group of examiners differ significantly in the marks they inde- 
pendently assign in a clinical situation to medical school graduates 
applying for licenses to practice in a particular state. This example 
is very similar to the illustrative problem discussed above. 

In a broad sense, double classification analysis may be thought 
of as a method of determining the significance or consequences of 
classifying data according to two relevant variables. In many cases 
one of the variables is of paramount concern, the other serving as a 
control. Here the analysis provides an exact test of the significance 
of classification on the principal variable, independent of the second 
variable. 

In all such classifications, if it can be assumed that the data are 
independent and normally distributed, the analysis is made like 
that illustrated above. If the differences among row or column 
means are not significant, the classification is without consequence 
in respect to the row or the column variable, i.e., randomness pre- 
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vails in that respect. However, even though randomness prevails 
with respect to both variables, the data may merit further analysis 
if they are unequally variable within rows or within columns, as 
established by the x? test contained in (8.43). If, for example, in 
Table 8.13 significant heterogeneity existed in the rows of indi- 
vidual ratings of the teachers or in the columns of ratings assigned 
by the supervisors, the heterogeneity might in itself be important 
and well worth study. 

Reliability of с(>2) Measurements. The schemata of Table 
8.13 may readily be extended to the problem of estimating the 
reliability of c(2 2) series of measurements or observations; in fact, 
the first question we asked about the ratings in the table was 
essentially a question of reliability. Moreover, the square root of 
the remainder or error variance of the ratings is analogous to the 
standard error of measurement c, discussed in Chapter VII. 

When we have c scores for each of r individuals, as would be the 
case if c parallel forms of a test were administered to a group or if 
the same test were repeated c times, we may readily classify the 
scores in c columns and r rows. The means of the rows provide un- 
biased estimates of the true scores of the individuals. If the variance 
estimate obtained from these means is significantly greater than 
the remainder or error variance estimate, the test discriminates 
significantly between the individuals, i.e., the estimated true score 
variance is significantly greater than error variance. 

This reasoning leads to the variance ratio of (8.49). If the ratio 
is not significant, the observations have no demonstrable reliability. 
This was the case for the ratings of Table 8.13. If the ratio is signifi- 
cant, the observations possess some reliability, i.e., they discriminate 
significantly between the individuals. How much reliability is an 
important question and one which is not answered directly by the 
ratio. It is possible for a ratio to be significant, yet the reliability 
of the observations to be too low to permit any of the uses listed 
on p. 351. 

There are several methods of obtaining estimates of reliability 
of c observations on r individuals—estimates which may be inter- 
preted in much the same way as the reliability coefficient. Analogous 
to the procedure contained in equation (7.10), an estimate may be 
obtained by subtracting the ratio of error variance to total variance 
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from one. Applying this procedure to the ratings of Table 8.13, 
where the error variance is 3.87 and the total variance is 276.0/55 
or 5.02, we obtain .23 as an estimate of the reliability of the ratings. 
As another method, we might find the product-moment coefficients 
of correlation between the c series of ratings, taken two series at a 
time, and average the coefficients. 

The most convenient and satisfactory way of estimating the 
reliability, however, is by intraclass correlation procedures. The 
intraclass coefficient of correlation, r5, as defined by Fisher (Ref. 8, 
p. 222), is equivalent to 

б $8 — 82 
A7 gx G- Det 
where s? is the variance estimate based upon individual mean scores 
and sj is the variance estimate based upon variation of scores 
within individuals. The coefficient is thus equivalent to the ratio 
of true score variance to obtained score variance. (As we saw in 
Chapter VII, the ratio of true score variance to obtained score 
variance is equal to the reliability coefficient as determined by 
correlation methods from parallel forms or test-retest scores.) 
Applying formula (8.51) to the ratings of Table 8.13, where 8 = 
8.40, s2 = 4.60, and c = 8, we get 
8.40 — 4.60 


п = 340 +7400) = 9 


When the observations are classified and analyzed as in Table 
8.13, it saves work to use s? rather than $2 in formula (8.51). Making 
this substitution, noting that s? = зу, and dividing numerator and 
denominator of (8.51) by sł, we obtain as a convenient approxima- 
tion of rj 


(8.51) 


F-—1 


^T FEY к 


where F is defined as in (8.49). Applying formula (8.51’) to the 
ratings of Table 8.13, we get 


217 —1 
n 


отта 


The coefficient г, is roughly comparable to the coefficient of 
reliability rı. In the special case where с = 2, i.e., where there are 


I 
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two scores for each of the individuals, r tends to be about equal 
to ri. 

As noted above, when we have c scores for each of r individuals, 
an estimate of the standard error of measurement of an individual 
score, analogous to с,, may be obtained by taking the square root 
of the error variance estimate 5. Thus, the standard error of meas- 
urement attaching to an individual rating of Table 8.13 is 4/3.87 
or about 2.0. 

Before leaving this topic, let us note that the variance ratio of 
(8.50) enables one to test the significance of differences between 
means of columns and thus to determine whether practice effects, 
differences in mean difficulty of the с forms of the test or с admin- 
istrations of the same test, differences among judges, and so on 
are significant. Just what this test tells us, of course, depends upon 
the source of the data. 

Analysis of Covariance. The analysis of variance techniques, 
described above, provide a method of testing the significance of 
differences among means of scores classified according to some par- 
ticular variable independent of the effects of a second variable. 

It also is possible to introduce control in two or more classes of 
experimental data by making allowance for initial differences 
among the classes which may have prejudiced the results of the 
treatment, This sort of control frequently is desirable in experi- 
mental work. For instance, in an experiment dealing with several 
methods of teaching reading, it would be desirable in analyzing 
the final results to take into account differences among students 
in original reading ability, if the groups had not been equalized in 
this respect at the beginning of the experiment. As another instance, 
in comparing average per capita costs in groups of schools classified 
according to size, it may be advisable to take into account differ- 
ences in pupil services rendered, assuming that these can be re- 
liably measured. Such control is possible in situations where there 
is available an associated measure for each of the final experimental 
measures. The analysis of differences among classes of final experi- 
mental data, taking into account differences existing in the asso- 
ciated initial data, is conventionally known as analysis of covariance. 

The sum of products of two variables X and Y is defined as the 
sum of products of the deviations from the means, or, in symbols 
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Улу. The covariance of X and Y is defined as the sum of products 
divided by №, Zzy/N. It will be seen that “sum of products” and 
“covariance” are analogous to “sum of squares" and “variance.” 
In the analysis of covariance, strictly speaking, it is the variance 
of Y independent of X, rather than the covariance, which is ana- 
lyzed. This will become apparent as we proceed. 

Although the mechanics of covariance analysis and the under- 
lying assumptions are somewhat involved, it is not difficult to follow 
the analysis of a given problem. Consider, for example, Seidman’s 
study of the training of retarded children (Ref. 19). The part of 
the study we shall consider here consisted essentially of dividing 
fifteen retarded children into three groups, A, B, and C. The children 
of group A received socializing experiences, mainly supervised 
group play, and their mothers concurrently took part in group 
conferences devoted to discussion of the problems of training re- 
tarded children. The children of group B received the same treat- 
ment as those of group A, but their mothers received no treatment. 
Neither children nor mothers of group C received treatment. (The 
experiment would have been more conclusive had there been a 
fourth group in which the mothers received treatment and the 
children none, but this was not feasible.) 

The social quotients of the three groups of children, obtained 
from the Vineland Social Maturity Scale before and after the ex- 
perimental period, are shown in Table 8.14. The deviations of the 
quotients from the various means also are shown in the table. These 
are included to simplify discussion. In practice, raw rather than 
deviation scores ordinarily are used in computations. 

In analyzing these data, the chief question is whether significant 
differences among the means of the final social quotients of the 
three groups remain after adjustment for the differences present in 
the initial social quotients. 

It will be instructive to examine the data graphically. The devi- 
ations z, and y, of the social quotients from the total means X, 
and У», respectively, are plotted in Fig. 8.21, and the line of regres- 
sion of Y on X is shown. The residuals about this line, as noted in 
Chapter VI, are the deviations of Y independent of X. They indicate 
the extent to which variables other than X influence Y, provided the 
regression is linear. Recalling that the standard error of estimate ту 
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Fig. 8.21. Deviations from total means апа total 
line of regression of Y on X. For this line b, = 
1536.6/1487:4 = 1.07, so that у’ = 1.07х. (From 
Table 8.14.) 


is the standard deviation of the residuals about the line of regression 
of Y on X, we may square formula (6.17), substitute Ey?/NN for og 
and (Zzy)?/ Ха? Ху? for гі, multiply by №, and simplify to obtain 


_ (ry)? 


D 
№ ху. =e 


v.z 


(8.52) 


which gives the sum of squares of the residuals about the line of 
regression of Y on X. Let us call this sum 2y”?. 

Substituting the appropriate values at the foot of Table 8.14 in 
formula (8.52), we obtain, as the sum of squares of residuals about 
the total line of regression, 


(1,536.6)2 


092 — == 
зу = 2,649.8 — Тата 


= 1,007.2. 
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This residual sum of squares is the total sum of squares of devi- 
ations of Y from the total mean Y, minus the sum of squares 
accounted for by regression.* It may be divided into two parts, one 
of which can be attributed to the scatter of the scores about a 
common within-group line of regression, the other to the differences 
among the group means. 

If it can be assumed that the m 
three true within-group regres- 
sion coefficients are equal, i.e., 
that the within-group regression 
lines are parallel, we can bring 
the means together by plotting B 
the respective deviations (£a, Уа), z 73 4 

-16 -12 -8 -4 
(ть, уь), (т, Ye) on common axes, 
as shown in Fig. 8.22, and thus 
eliminate that part of Ху/” 
which is due to differences 
among means, 

The sum of squares of the 


residuals about the common Fig. 8.22. Deviations from group 
within-group regression line of "means and common within-group 
line of regression. For this line 


Fig. 8.22 may be computed by , = 1136.0/1270.4 = .89, so that 
extension of formula (8.52). y'= .89x. (From Table 8.14.) 
Calling this sum Хуу? and sub- 

stituting the appropriate values from the foot of Table 8.14 in the 
formula, we have 


(1,136.0)? 


12704 = 410.6. 


zy/? = 1,426.4 


Now the difference Ху”? — Ху? is the sum of squares which 
may be attributed to the differences among final means independent 
of inilial differences. Calling this sum of squares 2y%°, we have 


ху"? = 1,007.2 — 410.6 = 596.6. 


*It was shown in Chapter VI that су? = ey? + oy.2%, where oy’? was the 
variance of-Y accounted for by X and су. :? was the variance of the residuals. 
Multiplying this equation by N, we have, in the present notation, Ey? = 
By? + By? 
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The sums of squares Ху//? and Ху? divided by the appropriate 

numbers of degrees of freedom, provide two independent unbiased 

. estimates of the population variance about the regression line of 
Y on X. The number of degrees of freedom n; for Zy// is the number 
of final means minus one. One degree of freedom is lost because 
the total mean Y, is computed from the data. This amounts to the 
single restriction that the deviations of the final group means from 
the total mean sum to zero. In the present case, п; = 3 — 1 = 2. In 
general, n; — k — 1, k being the number of groups. 

The number of degrees of freedom n; for Ху”? is the number in 
the total sample minus the number of final means minus one. In 
the present case, n; — 15 — 3 — 1 — 11. Four degrees of freedom are 
lost because the three group means and the common within-group 
regression coefficient, four constants in all, are computed from the 
data. In general, п» = N — k — 1. 

If the assumptions are satisfied that the true within-group regres- 
sion coefficients are equal and that the Y's are equally variable and 
normally distributed about the within-group regression lines, the 
ratio 


а (Е 1) 
Fy asa T Xy/(N-—k-1 (8.53) 
has an F distribution. If the numerator of (8.53) is significantly 
greater than the denominator, the differences among final means 
are significant. In the present case, we have 


596.6/2 
Fan = agg - 799 
so that, from Table F, .01 > P > .005. The conclusion follows 
that the differences among final means is highly significant, taking 
into account differences among initial means. In other words, the 
differences among final mean social quotients of the three groups 
of retarded children are not reasonably accounted for either by 
initial differences or by sampling fluctuations. The experimental 
treatments apparently result in real differences. 

The various sums of squares and products used in the above 
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analysis may be arranged in tabular form: 


SUM OF 
SUM OF SQUARES 

SOURCE OF SUM OF SQUARES PRODUCTS RESIDUALS DEGREES VARIANCE 
VARIATION 22% zy: Угу zy”: FREEDOM ESTIMATE 

Among groups 596.6 2 298.3 
(group means) 

Within groups 1,270.4 1,426.4 1,136.0 410.6 11 37.33 

TOTAL 1,437.4 2,649.8 1,536.6 1,007.2 13 


The “among groups” sums of squares and products are not needed 
in the analysis of covariance. 

In practice it usually is preferable to compute the sums of squares 
and products listed above from raw rather than deviation scores. 
The sums of squares may be obtained from the raw scores by 
methods described earlier in connection with testing the significance 
of differences among three or more means (see pp. 500-501). The 
sum of products may be obtained by 


a. Multiplying each X by its paired Y and summing the products. 
For the raw scores of Table 8.14, we have 
(78)(93) + (56)(73) + : : * + (45)(51) + (69)(66) = 69,718. 
b. Multiplying the total sum of the X's by the total sum of the Y's 
and dividing by the total number of pairs of scores. Thus, 
(940)(1,088) 
15 


c. Multiplying the sum of the X's in each group by the sum of the 
Y's in that group, dividing by the number of pairs of scores in 
that group, and summing the quotients. Thus, 

(328)(421) , (322)(356) , (290)(311) 
т 

9. Subtracting (b) from (a). The difference is the total sum of 
products. 

e. Subtracting (c) from (a). The difference is the within-group sum 
of products. 


For the data in hand, steps (d) and (e) result in 1,536.7 as the 
total sum of products and 1,136.0 as the within-group sum of 


= 68,181.3. 


= 68,582. 
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products. These values are in good agreement with those obtained 
by the deviation score method. 

Analysis of covariance may be used in any situation where it is 
logical to consider controlling a variable experimentally by equal- 
izing groups or matching individuals on the basis of that variable. 
It ordinarily results in a substantial reduction of within-group or 
error variance and thus leads to more precise results. The reduction, 
of course, depends upon the extent to which the initial and final 
data are correlated within groups. 

In our discussion we have touched upon only the simplest aspects 
of the technique and have not fully examined even these. As pre- 
viously noted, the assumptions underlying the technique are (1) 
linearity of regression, (2) homogeneity of residual variance within 
groups, (3) equal within-group regression coefficients, and (4) 
normality of the distribution of residuals within groups. 

The second assumption can be tested by the x? test contained 
in (8.43), but there are no conyenient tests for the others. When 
the numbers within groups are relatively small, as is usually the 
case, only very severe violations of the assumptions can be detected 
by statistical test. Nonetheless, the assumptions should be kept in 
mind. Before beginning an analysis, it is advisable to plot the 
data both for the total group and for each group and to fit regression 
lines to the plotted data. Inspection of the regression lines and the 
dispersion of the data about the lines will be helpful in judging 
whether the assumptions are reasonable. 

When the numbers within groups are fairly large, the assumptions 
can be tested adequately, but the tests in the main require quite 
laborious computations. Welch (Ref. 25) discusses the fundamental 
nature of these tests. A less technical discussion may be found in 
Ref. 3. 

Concluding Remarks. The F distribution is the most general 
of the sampling distributions. It embraces the normal, the /, and 


the x? distributions. The student can readily check the following 
Жайдар 


(СВ)? = Е, 
2 = Еул, 
МС ауы 
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by inserting the tabled values corresponding to а given probability 
in each. For example, the value of x? when n = 10 corresponding to 
Р = .05 is 18.3; and the value of Fio,» corresponding to Р = .05 is 
1.83. Rider (Ref. 17) discusses the transformations and substitutions 
through which the relationships can be shown to hold in general. 

The F distribution is widely used in connection with analysis of 
variance and covariance—topics which we have only very briefly 
introduced. More comprehensive treatments are available in ad- 
vanced textbooks, such as Refs. 3 and 14. The analyses become 
quite complicated beyond the simple cases we considered and 
cannot be adequately discussed in an introductory book. Despite 
the ramifications, however, the underlying question is always the 
same, namely, whether one independent unbiased estimate of a 
normal population yariance is significantly greater than a second, 
more precise, estimate. When the answer is affirmative, the effect 
of the factor to which the larger estimate may be attributed is 
significant. If the student keeps this central idea in mind, he will be 
able to get the general sense of even quite complicated research 
reports. 

We have stressed the assumptions underlying analysis of variance 
and covariance. The fact that the total sum of squares of classified 
data can be divided into parts which may be ascribed to certain 
factors or sources is in no sense a test of these assumptions. That 
the division can be made is due to the nature of sums and squares 
of classified numbers, not to any experimental quality of the data 
themselves. Whether the data satisfy the assumptions necessary in 
demonstrating that a variance ratio is distributed in the F dis- 
tribution is an issue entirely apart from the partition of the total 
sum of squares and products. 

When the samples are relatively small, the assumptions can rarely 
be statistically validated. Empirical studies indicate that moderate 
violation of the assumptions of homogeneity and normality do not 
seriously affect the distribution of the variance ratio. It is to be 
remembered, however, that a variance ratio can be proved to have 
an F distribution only if the assumptions are satisfied. 

The great advantage of methods of analysis of variance and co- 
variance is that they permit investigation of the effect of a particular 
variable, taking into account the effect of other relevant variables. 
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Although, in theory, experimental control of all variables except 
one is superior to statistical control, in practice the former is difficult 
to achieve and, if achieved, usually results in substantial reduction 
of sample size and loss of efficiency. Moreover, experimental control 
frequently introduces artificial conditions which make it difficult 
to generalize beyond the controlled situation. It seems quite likely 
that the methods will nullify the “law of the single variable" in 
educational research. 
Exercises 


80. In the Рз, зо distribution, what values of F correspond to the following 
percentage points or probabilities: .05, .025, .01, .005, .95, .975, .99, 
and .995? (Мо: the four latter percentage or left-hand critical points 
are the reciprocals of the .05, .025, .01, and .005 points of the Fso, 10 
distribution.) Sketch the 5 and the 1 per cent two-sided regions of 
rejection in the Ро, зо distribution. Sketch the 5 and the 1 per cent 
right-sided regions of rejection of the Р,» distribution. 

81. Suppose that one wished to extend Table Ғ to include the .95, .975, .99, 
and .995 percentage points. How would he proceed? 

82. Іп a random sample of 5 individuals, s? = 72.5. In a second sample of 
10 individuals, 8% = 217.5. Is the assumption that the sampled popu- 
lations are alike in variability consistent with this evidence? 

83. In exr. 41 at the end of Chapter ПІ the results from a methods experi- 
ment are given. In what way are the results significant? Can a ¢ test 
of the difference between means logically be applied? 

94. In a correlation problem № = 100, ra, = .45, пух = .65. There аге 
12 columns in the correlation table. (a) Test the significance of Try and 
and 7, (b) Is the assumption sound that the regression of Y оп X is 
linear? 

85. Іп ехг, 47, Chapter VI, how would you test the linearity of regression 
of Y on X? Of X on Y? 

86. The multiple coefficient of correlation А, зз; for the dental school data 
of Chapter VI was .73, with N = 146. How significant is this coeffi- 
cient? Using ср, estimate the confidence limits of Ё, зг. For what reason 
is the latter procedure questionable? 

87. Show by reference to a particular group of data, such as that on page 
464, or prove algebraically, that {2 = Fyn. 

88. How would you determine whether there are significant differences 
among the 10 schools of Table I, Appendix В, in respect to achievement. 
in, say, reading? What assumptions would you need to make and how 
would you judge their soundness? 
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89. 


90. 


91. 


92. 


In an experiment dealing with methods of teaching general science, 32, > 
20, and 40 pupils, respectively, were taught under three methods. The 
distributions of scores on a final achievement test are shown below. 
Are there significant differences among the methods at the 5 per cent 
level? 


METHOD 
SCORE A B с 
90- 2 4 
85- 3 1 3 
80- 2 4 4 
75- 4 3 8 
70- 6 1 8 
65- 7 4 6 
60- 4 3 4 
55- 2 2 
50- 2 1 
45- 1 2 
40 1 $- УР 
NUMBER 32 20 40 


Suppose that for each of the pupils in exr. 89 you had an ТО or a pre- 
experiment score on a science test. How would you analyze the final 
data? Discuss the assumptions you would need to make. In what way 
would this analysis be better than that of exr. 89? 

Suppose that you were asked by a state examining board to make a 
study of the reliability of the marks assigned to 50 essay examination 
papers by 4 independent examiners. How would you proceed > 

In current educational research periodicals, find a report of a study 
in which analysis of variance or covariance has been used. Be able to 
interpret and criticize the study. 
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Appendix А 


Mathematical Notes 


I. Derivation of formula (4.8). 


By the definition of (4.7), 


ДЕР: _ EX- му 
с М - N , 


where X is any item in a series, M the mean, and N the number of items, 
the summation extending over all items. Squaring the binomial in the 
radical and spreading the summation sign, we get 


zx: 
с = ТЫС 
But M?/N = NM*/N = М», since summing a constant N times gives N 
times the constant, and M = ZX/N. Therefore, 


E0090) 


Simplifying under the radical sign and removing 1//N*, we get 


с = 5 ММУХ* — (ХХ) 


И. Derivation of formulas (4.9) and (4.10). 


When the items in a series are grouped with class interval i constant, 
assumed to have the values of the mid-points of their respective class 
intervals, and coded in the d unit, the value of any item is AO + di and 
the mean of the series is AO + (Zfd/N)i, where AO is the arbitrary origin 
and f the frequency in a class. (See p. 95.) A deviation from the mean is, 
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therefore, АО + di — [А0 + (Zfd/N)i] or di — (Zfd/N)i. Hence, by 
definition, 


FR OE 


But (Sf/N)(2fd/N)? = (Xfd/N)*, since Zf/N times a constant equals 
the constant. Simplifying and removing 12 we get 


. |Ұ/а Ifd? 
= QE S s 


and by further поа 


с= < М/Ф — (5/14). 


III. The sum of squares of the deviations of the items in а series 
from the arithmetic mean is less than the sum of squares of their 
deviations from any other value. 


Let z be any deviation from the mean M. Then we have, as the sum of 
the squares of the deviations from the mean, 


Dz? = У(Х — М)? 
Я = DX? — 2МУХ + NM?, 
since 2M? = NM*. 
Now let 2, be any deviation from some value other than the mean, say, 
М + С. Then we have, as the sum of squares of the deviations from this 
value, 
Da? 


У(Х — М – О)? 
= УХ? — МУХ + NM? + NC? + 2NMC — 25ХС. 


Since УХ = NM, the last two terms vanish, so that 
Уа? = YX? — 2MZX + NM? + NC. 
Thus, Ул? is less than Xz? by the amount №°. (It will be seen that the 


same result is obtained whether C is positive or negative.) 


IV. If each score in a series of standard scores is multiplied by Н 


and added to K, the mean of the transformed series is K and the 
standard deviation Н. 
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Consider a series of standard scores г/с ог z. Let each score be multiplied 
by the constant Н and added to the constant К. Let the resulting scores 
be designated by Z’. Then Z' = Hz + К, and the mean 2” of the series is, 
by definition, 


ж. Х(Н + К) 
е» (сайды; 
% N 
Hzz >К 
ТИМ м 
But Xz = Уг>/с- 0, since Ez = 0, and ХК/Л = К. Hence, 
Z-K. 


The standard deviation о.’ of the transformed series is, by definition, 


Xd: = Sk) ЕЁ H? dz? 
2542 N N 
Zz? Ут 


Вш = T = 1, since Уа?/ М = о?. Therefore, 


үз № 
ov = УН? = Н. 


V. Derivation of formula (6.5) for rj. 


Given the biserial data X and Y, such as those of Table 6.5, where X 
has the value 1 or 0 and Y is continuous. Let 
Y; be the Y scores paired with .X scores of 1; 
Yo be the Y scores paired with X scores of 0; 
p be the proportion of “1” scores; 
q be the proportion of “0” scores, so that p + 4 = 1. 
Since ry, is a product-moment coefficient of correlation for biserial data, 
we have, by formula (6.1), 
Ху _ ZXY- NXY 
Nowy Noo, 
But УХУ = ХУ, since the Y, are multiplied by 1 and the Y, by 0. Also, 
X = pand У = ХУ/М = (ХҮ, + ZYo)/N. Hence, 
ХУ, — p(ZYi + ХҮ) 


гь = 


г» = 


Nowy 
ЕЗ ХҮ(1 — р) — p2Yo 
RUE ean Nosy 
ХҮ4- EYwp 


Nowy 
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Since p is the proportion of “1” scores, Np is the number of Yi, i.e., the 
number of Y scores in the “1” category. Then Y; = ZY,/Np, so that 
DY, = YiNp. Similarly, Yo = ZY,/Nq, so that ХУ, = YoNq. After 
substitution and simplification we have 
n Ci Yo)pq. 

7:7, 


Following the method of proof оп р. 361, it may readily be shown that 
= A// pq. Substituting +/pq for oz and simplifying, we finally have 


(Yi - Yo) Vp 


ту 


VI. Derivation of formula (6.8) Гог rp. 


rp = 


Let the frequencies in the 2 х 2-fold correlation table be A, B, C, and D, 
and let the table be completed as shown below. | 


fu dy fd, 2 2 


TOW 


a | в | ate т A+BA+B В В 


Se ЛЕТЕ BD, Мат 
4, 0 1 B+C+D) 
n 0 B+D 
һа} DIS 244275 


Then, 2/,4,- B + D, Ху, = B + D, Xf,d, = A + В, Zf,d; = A + B, 
and Х(а4,У4.) = В. Substituting these values in formula (6.7), we have 
= (B+ р)(А + В) 
ММВ + D) — (В + БМ + B) — (A + В) 
NB — BA — B? — BD — AD 
A/G + D)(N — B — D)(A + B(N — A — B) 
T B(N — A — B — D) — AD d 
A/(B + D(N — B — D)(A + B(N — A — B) 
But М-А- В-р= 0, М-В-р= А +С, an №- А – В = 
С + D. Substituting and rearranging terms, we have 
BC — AD Е 
УА + В)(С + D)(A + С)(В + D) 


Tp 


п 
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УП. Derivation of formula (6.17). 


The residuals about the line of regression of Y on X are defined as the 
differences between observed and predicted scores. Let у and у’ be observed 
and predicted scores, respectively, in deviation form. Then the standard 
deviation ту. of the residuals y — y’ is, by definition, 


1 
буз = AUN By у): 


From equation (6.12), у = (Zxy/Za?)x. Hence, 


ED 


Ху? _ „(Ушу)(Уту) , (Za?)Zmy) 


N N&a? NÈ)? 
Xy Coy), 
N NZz* 


Squaring equation (6.2) gives гд, = (Zzy)?/Za?Zy*, so that (Zzy)?/ Ха? 
= Zymi. Making this substitution we have, 


from which, replacing Xy?/N with o7, we obtain 
суа = 0, М1 — г?, 


VIII. The variance of obtained scores is equal to the sum of the 
variances of true scores and errors, i.e., 02 = 0% +07, provided the 


errors are uncorrelated with true scores. 


Following equation (7.3), X, = X,, + E, where X, is an obtained score, 
X,, а true score, and E an error of measurement. In deviation score form, 
2 = Zy + e. Then the variance с? of obtained scores is, by definition, 


4-24. 246% 
EUN N 
Squaring the binomial at the right and spreading the summation sign, we 
get 
N 


Eze, Ze 
NUN 


оў = 


+2 
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Tf the errors are uncorrelated with the true scores, Zz,e = 0, and we may 
write 
2 
во = cic ci. 


IX. Derivation of formula (7.7). 


Suppose that each. of № individuals in a group has been given parallel 
forms of a test or the same test twice. Then for each individual we have 
the two obtained scores X; and Хз. Following equation (7.3) we may write, 


in deviation score form, 
жү = Lp + ên та = La F es 


We shall have № such pairs of scores, and the product-moment coefficient 
of correlation between them defines the reliability coefficient гут. Then, 


Deity (ғ, Неда» + е). 
Novos Мет» 


Multiplying in the numerator at the right and spreading the summation 
sign, we get 
Dak, + Danes + Ema + Zeie: 

Noo: 


Ги = 


If there is no correlation between errors and between errors and true scores, 
the three latter terms in the numerator vanish. Moreover, if the tests are 
truly equivalent, c; = с: = бо, say. Under these conditions we may write, 
replacing Zz*,/N with о, 


Table I. 


Table П. 


Table III. 


Appendix B 


Chronological Ages, Mental Ages, Intelligence Quotients, 
and Scores on the Metropolitan Achievement 'Тезїз of 
293 Eighth-Grade Pupils, by Schools 


Admission Data and First Semester Performance of 146 
Liberal Arts College Male Freshmen 


Normally Distributed Scores of 400 Individuals, 200 of 
Whom Have Attribute A and 100 Attribute B (Data 
Fictitious) 


TABLE 1 
CHRONOLOGICAL AGES, MENTAL AGES, INTELLIGENCE QUO- 
TIENTS, AND SCORES ON THE METROPOLITAN ACHIEVEMENT 
TESTS OF 293 EIGHTH-GRADE PUPILS, BY SCHOOLS* 


PUPIL , | READ- eae ARITH. 
SCH. SEX| С.А. | М.А. | 10 VOCAB. | FUNDA- 
NO. ING PROBS. 
MENTALS 
A 000 С | 156 | 134 87 19 22 34 11 
N =23| 001 G | 159 | 180 | 110 44 46 36 26 
002 G | 153 | 157 | 102 24 34 34 10 
003 B | 158 | 164 | 103 43 50 34 22 
004 G | 160 | 145 91 33 39 24 9 
005 | G | 149 | 139| 94| 29 37 34 6 
006 B | 189 | 150 81 36 41 27 15 
007 С | 176 | 156 | 90| 25 25 26 14 
008 G | 177 | 147 85 29 34 28 21 
009 В | 188] 141 | 76| 23 31 30 16 
010 В | 148 | 161 | 109 31 43 31 14 
011 а | 154 | 158 | 102 33 37 33 16 
012 G | 170 | 145 87 38 34 35 14 
013 С | 154| 135 88 18 22 39 10 
014 В | 151 | 161 | 106 33 40 34 23 
015 В | 155 | 164 | 105 | 33 46 34 21 
016 | С | 160 | 145 | 91| 39 45 29 19 
017 В | 151 | 145 95 33 31 32 11 
018 В | 158 | 143 91 37 47 31 18 
019 G | 159 | 145 91 32 33 34 16 
020 G | 160 | 161 | 101 36 38 16 
021 В | 178 | 157 89 37 36 16 
022 В | 160 | 170 | 104 40 42 14 
B 023 G | 153 | 129 85 24 34 13 
N =18| 024 G | 166 | 135 84 40 44 4 
025 G | 158 | 142 91 37 30 8 
026 С | 146 | 146 | 100 57 52 13 
027 В | 175 | 139 81 48 32 20 
028 С | 150 | 159 | 106 39 42 17 
029 С | 146 | 127 88 48 43 21 


TABLE I. (Continued) 


ағеке малта емен а елена 
PUPIL ь | READ- AIDE ARITH. 
SCH. wo, | 88| e^ M.A. | 10 жй vocan; FUNDA- | Lions. 
MENTALS 

030 G | 175 | 140 82 26 19 29 8 
031 G | 159 | 171 | 105 39 32 28 8 
032 B | 146 | 149 | 102 4T 40 14 5 
033 G | 152 | 161 | 106 32 27 28 14 
034 G | 148 | 150 | 101 24 26 21 5 
035 B | 166 | 130 81 30 23 23 8 
036 G | 153 | 144 94 30 30 32 14 
037 В | 145 | 167 | 114 39 30 32 13 
038 | В | 151| 132| 88| 35 24 31 15 
039 | В | 163 | 139 | 87| 28 22 33 15 
040 РЕҢ 144 | 125 50 34 26 23 2 
с 041 В | 161 | 199 | 117 45 50 24 13 
М = 38| 042 | G | 158 | 165 | 104 | 39 35 34 11 
043 B | 140 | 199 | 131 49 50 35 19 
044 G | 171 | 154 92 46 38 37 13 
045 | G | 158 | 180 | 111 | 43 46 38 20 
046 | G | 163 | 169 | 103 | 40 31 31 9 
047 | G | 166| 144| 88| 34 35 23 2 
048 | B | 149 | 184 | 120| 41 41 33 13 
049 С | 163 | 175 | 105 45 50 24 10 
050 | G | 149 | 171 | 113 | 32 38 21 12 
051 G | 188 | 139 175 21 24 1T 2 
052 | G| 165| 140| 87| 32 29 31 6 
053 | G | 161 | 147| 92| 33 29 16 4 
054 | B | 163| 142| 89| 22 24 21 2 
055 В | 172| 158 94 43 31 35 11 
056 | В | 164| 135 | 85| 33 34 10 0 
057 В | 177 | 142 82 29 31 10 2 
058 С | 169 | 142 86 29 30 2 1 
059 G | 157 | 124 81 19 22 19 2 
060 В | 160 | 137 87 30 36 24 3 
061 | В | 154] 151| 98) 23 20 46 17 
062 | G | 158 | 116| 74| 18 16 19 4 
063 | В | 177 | 135 | 7 15 17 28 те 
064 | В | 182 | 147) 82] 35 35 31 14 
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ARITH. 


SCH. po вех | с.л. | m.a. | 10 ur УОСАВ. | FUNDA- e 
MENTALS 
065 B | 165 75 | 106 45 38 22 13 
06 | G | 170] 161) 96| 35 39 34 10 
067 В | 183 | 135 76 22 16 37 6 
068 G | 171 | 146 87 31 31 26 10 
069 С | 169 | 154 93 21 34 23 8 
070 | G | 169 | 161| 96| 39 36 23 1 
071 С | 188 | 156 85 35 24 31 13 
072 G | 175 | 127 74 31 31 25 2 
073 | B | 188 | 161| 87| 34 32 20 12 
074 G | 193 | 142 76 29 32 18 4 
075 В | 162 | 135 86 28 24 33 10 
076 В | 168 | 139 85 32 35 21 6 
077 В | 174 | 176 | 101 46 4T 34 21 
078 | Bj 154 | 188 | 118 41 40 29 5 
р 079 G | 175 | 144 84 21 38 32 13 
М = 37 
080 В | 162 | 192 | 114 49 41 29 11 
081 G | 159 | 132] 85| 21 27 26 9 
082 В | 178 | 175 98 36 45 31 17 
083 С | 148 | 165 | 111 | 25 44 36 13 
084 | С | 162 | 161 | 100 7 46 35 12 
085 | В | 163 | 140) 87] 30 33 25 8 
086 В | 155 | 182 | 104 36 37 48 25 
087 | B | 160 | 154 7| 38 38 19 6 
088 В | 165 | 161 98 4l 40 36 23 
089 | С | 156 | 132| 87| 25 29 33 6 
090 | В | 166| 158| 96| 34 41 30 14 
091 В | 152 | 153 | 101 35 38 30 10 
092 G | 173 | 146 86 39 39 23 4 
093 | В | 172 | 138 | 83| 26 34 21 10 
094 | С | 179 | 129| 74| 28 29 26 4 
095 В | 160 | 176 | 108 36 47 26 19 
096 | G | 155 | 171 | 109| 39 48 35 16 
097 В | 157 | 158 | 101 44 38 33 19 
098 | В | 161 | 19| 75| 22 27 20 5 
099 | В | 184 | 144| 79| 35 41 24 10 


TABLE I. (Continued) 


ARITH. 
scu. | РОР |sex] c.a. | ма. | 10 FUNDA- RET 
NO. wEwrALS| PROBS: 
100 С | 153 | 129 85 25 6 
101 G | 160 | 149 94 39 10 
102 В | 148 | 182 | 120 36 18 
103 В | 144 | 184 | 123 32 11 
104 G | 182 | 127 70 29 23 
105 В | 159 | 180 | 110 29 24 
106 G | 179 | 163 92 33 1 
107 С | 156 | 140 | 91 18 4 
108 G | 154 | 159 | 103 24 8 
109 В | 157 | 153 97 24 11 
110 В | 173 | 156 92 23 15 
111 В | 175 | M4 84 28 11 
112 В | 148 | 178 | 118 30 23 
113 B | 164 | 211 | 120 28 21 
114 G | 147 | 150 | 103 28 11 
115 | B | 169 | 194 | 112 44 29 
Е 116 G | 160 | 153 96 35 12 
N = 35| 117 С | 158 | 144 92 43 13 
118 С | 158| 154 98 50 24 
119 С | 166 | 135 84 38 23 
120 В | 158 | 165 | 104 46 45 4T 23 
121 G | 157 | 165 | 104 42 34 46 23 
122 G | 157 | 159 | 102 43 40 42 18 
123 С | 154 | 178 | 113 Al 36 47 24 
124 В | 165 | 150 92 34 37 31 8 
125 G | 168 | 158] 96 45 45 37 14 
126 В | 153 | 189 | 117 50 48 43 19 
127 С | 164 | 147 91 43 42 42 19 
128 С | 157 | 157 | 100 18 29 36 12 
129 В | 176 | 147 85 33 32 42 15 
130 С | 168 | 168 | 100 36 36 44 22 
131 G | 186 | 138 75 31 29 39 17 
132 С | 170 | 122 71 31 34 24 8 
133 В | 154 | 156 | 101 36 51 42 25 
134 С | 188 | 161 87 | 34 32 20 12 
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TABLE Г. (Continued) 


ARITH. 
PUPIL „ | READ- ARITH. 
всн. wo. | Sex| €^ М.А. | 10 ing | VOCAB: | FONDA- | вова. 
MENTALS 
135 G | 171 | 194 | 111 48 45 51 29 
136 В | 153 | 153 | 100 | 31 41 39 23 
137 В | 154| 149] 96| 33 42 40 18 
138 В | 144 | 182 | 122| 50 41 37 20 
139 В | 154 | 151| 98] 23 25 36 22 
140 G | 189 | 163 | 88| 41 39 30 13 
141 С | 177 | 158| 91| 34 38 35 YR 
142 В |166 | 146| 89| 37 29 27 15 
143 В | 164 | 150] 92| 28 31 44 23 
144 В | 165 | 160] 97| 37 32 28 15 
145 С | 169 | 153| 92| 33 37 35 25 
146 С | 183 | 146 | 81 29 26 36 8 
147 С | 166 | 123| 76| 22 18 36 14 
148 С | 172 | 130| 79| 27 24 29 9 
149 С | 163 | 158| 97| 40 29 37 11 
150 В | 169 | 169 | 100 | 32 37 39 19 
F 151 G | 152 | 175 | 113 | 45 46 29 17 
N = 35| 152 G | 160 | 153| 96| 38 41 39 21 
153 G | 180 | 175 | 97| 43 47 30 10 
154 В | 158 | 165 | 104| 47 45 43 19 
155 В | 156 | 169 | 106 | 43 50 36 15 
156 В | 167 | 161 97| 43 41 35 21 
157 В | 156 | 190 | 116 | 41 42 42 23 
158 G | 153 | 134| 89| 24 33 36 18 
159 В | 153 | 129| 85| 27 34 26 a 
160 G | 151 | 156 | 103 | 38 34 37 26 
161 G | 187 | 142 | 7 37 38 23 T 
162 G | 187 | 150| 81 13 39 20 T 
163 G | 155 | 180| 114| 46 48 32 9 
164 G | 177 | 167 | 96| 35 31 38 15 
165 G | 164 | 145] 89| 33 36 30 12 
166 В | 167 | 159| 96| 33 38 32 17 
167 |. @ | 162 | 129| 82| 28 24 41 19 
168 В | 173 | 180 | 104| 34 46 44 21 
169 В | 166 | 226 | 125' 47 45 44 28 
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SCH. 
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TABLE I. (Continued) 


ARITH. 


PUPIL ARITH. 
NOSE eon шы FUNDA- | props. 
MENTALS 
170 B | 163 20 
171 B | 170 11 
172 G | 191 17 
173 G | 162 27 
174 В | 160 27 
175 G | 161 19 
176 С | 155 21 
177 G | 154 21 
178 В | 180 15 
179 | В | 152 23 
180 В | 151 20 
181 G | 149 24 
182 С | 162 25 
183 | С | 158 23 
184 G | 153 21 
185 | G | 162 19 
B | 162 10 
G | 154 23 
B | 158 12 
B | 154 15 
B | 150 8 
С | 148 18 
В | 179 13 
С | 183 1 
В | 163 11 
В | 158 3 
B | 170 4 
G | 158 9 
G | 177 0 
G | 177 7 
B | 180 0 
В | 195 0 
В | 154 20 
G | 192 1 
G | 151 19 


TABLE I. (Continued) 


ь | READ- ARITH. 
SCH. 0 ма оса. | HONDA SONS 
MENTALS 

93 | 28 29 28 7 
110 | 43 49 31 18 
90| 40 24 12 6 
88| 30 29 37 7 
109 | 42 44 47 23 
101 | 36 41 29 22 
106 | 47 49 31 21 
98 | 42 45 27 15 
110| 45 45 29 14 
89 | 33 39 23 7 
86| 21 26 19 13 
79 | 30 29 23 11 
89| 40 34 31 17 
н 85| 31 36 25 4 
N = 29 85 | 28 38 27 13 
95 | 37 46 28 12 
91 28 22 22 7 
12| 41 45 31 15 
100 | 46 43 29 8 
105 | 47 44 24 9 
87 | 24 31 18 14 
124 | 50 49 40 20 
т 37 29 15 5 
81| 29 33 17 7 
85 | 30 29 32 9 
100 | 46 36 31 17 
96 | 34 38 31 13 
94| 41 45 36 17 
78| 19 15 27 8 
121 | 48 46 49 24 
96| 30 39 17 Ж 
109 | 35 45 33 20 
79| 33 24 32 15 
97| 28 24 31 13 
107 | 46 48 30 21 
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TABLE 1. (Continued) 


bius C.A. | M.A. | 10^ READ-| voca, ddr IUE 
SEX „А. LA. Я - " z 
NO. ING MENTALS PROBS. 
240 B | 148 | 149 | 101 35 46 32 13 
241 B | 148 | 181 | 117 50 53 15 12 
242 G | 163 | 172 | 104 49 46 24 22 
243 | B | 150| 146| 97| 34 36 20 9 
244 G | 157 | 179 | 111 50 50 41 22 
245 | В | 165 | 179 | 106 | 42 46 21 14 
246 С | 156 | 181 | 112 50 41 D 19 
247 С | 187 | 171 90 30 39 25 12 
248 В | 161 | 167 | 103 38 36 41 24 
249 В | 151 | 134 89 31 29 20 4 
250 С | 165 | 135 84 27 29 18 6 
251 С | 160 | 158 99 28 41 37 24 
252 В | 160 | 127 80 23 32 33 7 
253 | В | 172| 146| 87| 24 20 31 15 
254 | В | 174| 127 | 75) 20 32 24 1 
255 | G | 176| 132] 77| 25 22 27 if 
256 | В | 155 | 153 | 99| 34 29 26 13 
257 В | 157 | 128 84 12 26 28 
258 | G | 157 | 146] 94| 29 43 30 2 
259 | G | 154 | 176 | 112| 39 43 31 17 
260 | G | 167 | 151| 92| 32 29 37 17 
261 С | 167 | 184 | 108 42 47 43 24 
262 В 44 48 26 9 
263 | G 32 27 32 12 
264 В 34 44 35 11 
265 | B 38 36 34 9 
266 B 21 29 33 11 
267 в 20 27 32 7 
268 G 31 22 23 1 
269 | G 33 38 39 19 
270 | G 29 27 30 10 
271 в 32 36 29 14 
272 | В 23 16 15 11 
273 B 18 24 29 7 
274 | С 18 16 25 3 
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TABLE I. (Continued) 


ARITH. 
PUPIL " " ARITH. 
sc AO SEX ю VOCAB. | FUNDA- | оры, 
MENTALS 
215 G 96 41 12 4 
276 с 96 42 22 12 
277 а 97 42 26 9 
278 B 93 34 31 15 
279 B 94 39 25 10 
280 B 103 42 36 26 
281 G 89 26 20 8 
282 B 81 20 31 20 
283 В 101 35 33 18 
284 В 85 34 30 7 
285 B 89 35 28 10 
286 G 91 24 32 14 
287 с 85 33 32 18 
288 в 89 37 28 и 
289 G 93 30 32 15 
290 G 72 24 6 6 
291 G 94 24 23 9 
292 G 83 26 25 5 


ЕЕЕ 

a The data in the table were obtained during a school survey іп а medium-size, 
eastern industrial city. One elementary school was selected at random from 
each of the 10 geographical areas of that city, and one section of each of grades 
3 to 8 was selected at random from each school to take intelligence and achieve- 
ment tests. The 10 eighth-grade sections so selected are included in the table. 
The tests used were the Pintner General Ability Tests and the Metropolitan 
Tests, published by the World Book Company, Yonkers, New York. Scores 
on the English, Literature, History, Geography, and Spelling tests of the 
Metropolitan Battery are not included in the table. 

ò The Pintner IQ is obtained by standard score procedure, not by dividing 
chronological age by mental age. The procedure is discussed in the manual 
accompanying the Pintner General Ability Tests. 
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TABLE Ш 


NORMALLY DISTRIBUTED SCORES OF 400 INDIVIDUALS, 200 
OF WHOM HAVE ATTRIBUTE A AND 100 ATTRIBUTE B 


(Data fictitious) 

ОЧИН ОИ ey 17—27 мо _ 

ы a B Б ы в a Е 

Е Р Е E E bi в Е 

2 я 2 5 2 = > = 2 я 2 

44538. [Вы Е аа нЕ 
НЕЕ ЕЕЕ ЕЕ 56 Е ЕНСЕ ЕНЕ 528 E 
5585 асаа BPS 4|йЕ 4) ШЕЯ | аға < 
000 30 050 40 А |10034 В | 150 29 200 50 А | 250 60 A B 
00140 B |051 45 А |101 23 B| 151 44 В| 201 44 B 251 41 A 
002 26 А |052 33 В |102 43 А | 152 37 А | 202 32 В 252 38 А А 
003 36 А |053 35 103 33 153 49 A | 203 40 А | 253 43 A B 
00445 А |054 38 А |104 47 A | 154 40 A | 204 35 А 254 22 A 
00537 B |055 11 105 40 А | 155 50 В | 205 42 А | 255 32 
006 41 056 36 А |106 28 А | 156 33 206 34 А | 256 39 А А 
007 51 A |057 24 А |10744 В | 157 39 А | 207 27 257 37 А A 
008 35 В |058 37 В | 108 30 158 22 В | 208 49 А | 258 28 308 31 А A 
009 25 059 51 109 38 В | 159 20 A | 209 37 В | 259 34 A | 309 38 В A 
01031 A | 060 41 А |11045 А | 160 31 А | 210 46 260 40 B| 310 40 B 
011 44 061 30 111 37 A| 161 42 В| 211 25 A | 261 29 B | 311 25 А A 
012 44 A |062 32 А |112 25 А | 162 35 212 46 А | 262 41 А | 312 37 В А 
013 38 В |063 45 113 42 А |163 47 А | 213 51 263 38 В | 313 50 А А 
014 48 А |064 38 А |11427 А | 16429 А | 214 28 А | 264 44 А | 314 44 А A 
015 39 065 42 В |115 39 А | 165 38 В | 215 34 В | 265 29 А | 315 27 
016 29 В |066 31 116 32 А | 166 52 216 33 266 53 А | 316 43 А в 
01719 В |067 45 А |117 32 A | 167 32 А |21741 В | 267 35 В | 317 37 
01815 068 39 A |118 41 168 44 В | 218 23 А | 268 46 В | 318 40 В А 
019 48 А |069 25 А |119 21 А | 169 51 А | 219 45 А | 269 30 А | 319 42 В А 
020 42 070 38 А |120 43 A | 170 33 220 38 А | 270 31 В | 320 33 В A 
02133 071 42 B |121 37 A |171 13 В | 221 45 A | 271 49 A | 321 20 B 
022 46 A |072 52 122 28 B|172 46 В | 222 54 В | 272 24 А | 322 31 А 
023 32 073 30 А |123 39 В | 173 34 А | 223 28 А | 273 52 323 45 Al] 373 34 В 
024 49 В | 074 41 B |124 41 В | 174 43 A | 224 31 274 39 324 50 374 42 A 
025 33 075 47 А |125 34 А | 175 30 А | 225 39 B | 275 57 А | 325 36 375 27 
026 54 В |076 34 126 45 176 45 А | 226 45 А |276 31 А | 326 41 В | 376 27 А 
027 39 077 44 127 36 А|177 50 А | 227 48 277 36 В | 327 47 B |377 35 В 
02827 078 37 А |12826 178 36 В | 228 26 А | 278 33 В | 328 26 В | 378 41 А 
029 35 079 26 129 35 А | 179 28 А | 229 35 279 42 А | 329 35 В | 379 38 
030 47 А |080 32 А |13043 А | 180 46 230 48 А | 280 33 А | 330 40 В| 380 44 A 
03142 А |081 39 131 47 А | 181 57 А | 231 36 А | 281 53 331 48 381 21 A 
03255 А |082 29 B |132 34 В | 182 32 232 42 В | 282 41 A | 332 32 А | 382 51 А 
03332 А |083 36 В |133 43 А | 183 40 А | 233 53 А | 283 45 B | 333 46 383 39 A 
03444 В |084 53 А |134 56 В | 184 39 234 31 284 27 А | 334 39 А | 384 34 А 
035 52 085 28 А |135 42 А | 185 43 В | 235 44 В | 285 51 А | 335 42 А | 385 38 В 
036 19 086 48 А |136 69 А | 186 48 В | 236 57 А | 286 42 А | 336 16 А | 386 50 
03730 А |087 38 137 41 А |187 33 А | 237 34 A | 287 47 337 54 В | 387 55 А 
038 43 A |088 49 В |138 52 А | 188 46 В | 238 42 В | 288 36 А | 338 38 В | 388 36 A 
03955 В | 089 50 А |139 37 А | 189 48 239 49 А | 289 44 А | 339 58 389 54 В 
04055 А | 090 35 В |140 54 190 48 А | 240 37 А | 290 38 А | 340 29 A | 390 46 
041 23 091 58 A |141 46 191 24 В | 241 61 А |291 54 А | 341 41 В | 391 49 А 
042 49 А | 092 46 А |142 51 В | 192 47 В | 242 48 А | 292 59 В | 342 56 А | 392 22 
043 43 В |093 59 А |143 34 В | 193 59 243 31 А | 293 34 343 37 393 43 А 
044 52 094 36 144 47 В | 194 40 А | 244 18 A | 294 47 А | 344 51 394 55 В 
045 36 А | 095 61 А |145 58 195 64 В | 245 43 В | 295 41 345 62 А | 395 35 А 
046 37 A |096 47 146 42 А | 196 65 246 51 А | 296 46 В | 346 43 А | 396 63 В 
047 38 В |097 56 147 17 А | 197 35 А |247 40 А |297 43 В | 347 40 397 44 А 
048 46 098 29 А |148 67 А | 198 60 248 53 А | 298 53 В | 348 52 398 52 А 
04950 А |099 40 149 31 А| 199 57 В| 249 35 А | 299 36 В| 349 48 399 33 
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| Аррепаїх С 


Table A. Proportion of Total Area under the Normal Curve 
between Mean Ordinate and Ordinate at Given z Dis- 
-tance from the Mean 

Table B. Ordinates (y’) of the Standard Normal Curve 


Table С. Values of z, for Given Values of the Product-Moment 
Coefficient of Correlation 


Table D. Values of / Corresponding to Given Probabilities 


Table E. Values of x? Corresponding to Given Probabilities 


Table Е. Values of F Corresponding to Given Probabilities 


| Table С. 90 and 98 Per Cent Sampling Limits of o; and o4 for 
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TABLE C^ 


VALUES OF z. FOR GIVEN VALUES OF THE PRODUCT- 
MOMENT COEFFICIENT OF CORRELATION 
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ПО 12 137.39 (6957-43 1871 1,43 
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515: 115 40  .42 165.78 190 1.47 
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У .42 145 6700281 .92 1.59 
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ар а! .46  .50 1.89 .96 1.95 
a КҮЙЕ 179.91 297 2.09 
оз 523 .48 .52 sibs 03 198 2.30 
.24 .24 .49  .54 174. .95 199 2.65 


a Table С is adapted from Table V. B. of Fisher, Statistical Methods 
for Research Workers, published 1950 by Oliver and Boyd, Ltd., 
Edinburgh, by permission of the author and publishers. 
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TABLE D* 


VALUES OF t CORRESPONDING TO GIVEN PROBABILITIES’ 
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-------------------------------------------- 
в'ТаЫе D is abridged from Table III of Fisher and Yates, Statistical Tables for Biological, 
Agricultural, and Medical Research, published by Oliver and Boyd, Ltd., Edinburgh, by per- 


mission of the authors and publishers. 


+ [n using this table it must be remembered that the probability figures are based upon both 
tails of the distribution. The probability of a deviation greater than (or less than) the tabled 
value of l is 1/2 of the corresponding probability figure at the head of the column. A table of t 
giving the areas included between the ordinates at various values of Land  forn = 1 ton = 20 
may be found in Peters and Van Voorhis, Stalislical Procedures and Their Mathematical Bases. 
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TABLE С° 
90 AND 98 PER CENT SAMPLING LIMITS OF оз 
AND o, FOR SAMPLES OF VARIOUS SIZES FROM 
A NORMAL POPULATION 


SIZE es а 

OF 
SAMPLE 90% 98% 90% 98% 

50 #58 dE 179 — — 

100 x.39' 4:57 2.35-3.77 2.18-4.39 

125 +.35 +.51 2.40-3.70 2.24-4.24 

150 +.32 +.46 2.45-3.65 2.29—4.14 

175 +.30 +.43 2.48-3.61 2.33-4.05 

200 +.28 +.40 2.51-3.57 2. 

250 +.25 1.36 2.55-3.52 2. 

300 +.23 1.33 2.59-3.47 2. 

350 БЕЛІН #30 2.62-3.44 2. 

400 +.20 +.28 2.64-3.41 2. 

500 +.18 +.26 2.67-3.37 2.57-3.60 

600 X16 223 2.70-3.34 2.60-3.54 

100 ®.15-. 5.22 2.12-3.31 2.62-3.50 

800 t. +.20 2.14-3.29 2.65-3.46 

900 Бо. С 419 2.75-3.28 2.66-3.43 
1,000 E ESOS 2.76-3.26 2.68-3.41 
1,200 PAZ 27216 2.78-3.24 2.71-3.37 
1,600 ЕТО Е Д4 2.81-3.21 2.74-3.32 
2,000 4.09 +.13 2.83-3.18 2.77-3.28 
5,000 +.06 1.08 2.89-3.12 2.85-3.17 


а Table G is adapted from Table IV of E. S. Pearson’s 
“А Further Development of Tests of Normality,” copy- 
right 1930 by Biometrika, 22:239-249, and used by 
permission of author and editor. 
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TABLE 1 


SQUARES AND SQUARE ROOTS 


vion Vn Vion 

J 1.00000 | 3.16228 | 150 | 2.2500 1.22474 | 3.87298 

1.0201 100499 | 3.17805 | 151 | 2.2801 1.22882 | 3.88587 

4 1.0404 1.00995 | 3.19374 | 152 | 2.3104 123288 | 3.89872 
К 1.0609 101489 | 320936 | 153 | 2.3409 | 1.23693 | 3.91152 
1.04 | 1.0816 1.01980 | 3.22490 | 154 | 2.3716 1.24097 | 3.92428 
1.05 | 1.1025 102470 | 3.24037 | 155 | 24025 124499 | 3.93700 
106 | 1.1236 102956 | 3.25576 | 156 | 24336 124900 | 3.94968 
107 1.1449 103441 | 3.27109 | 157 | 24649 1.25300 | 3.96232 
108 | 1.1664 103923 | 3.28634 | 158 | 24964 | 1.25698 | 3.97492 
1.09 1.1881 104403 | 3.30151 | 159 | 2.5281 1.26095 | 3.98748 
110 | 12100 1.04881 | 3.31662 | 160 | 2.5600 1.26491 | 4.00000 
111 1.2321 105357 | 333167 | 1.61 | 2.5921 1.26886 | 4.01248 
112 | 12544 105830 | 3.34664 | 1.62 | 2.6244 1.27279 | 4.02492 
1.13 1.2769 1.06301 | 3.36155 | 1.63 | 2.6569 127671 | 4.03733 
114 | 12996 10671 | 3.37639 | 1.64 | 2.6896 | 1.28062 | 4.04969 
115 | 13225 1.07238 | 33916 | 1.65 | 2.7225 128452 | 4.06202 
116 | 13456 107703 | 3.40588 | 1.66 | 2.7556 128841 | 407431 
147 1.3689 1.08167 3.42053 1.67 2.7889 1.29228 4.08656 
118 | 1.3924 1.08628 | 3.43511 | 1.68 | 2.8224 1.29615 | 4.09878 
119 | 1.4161 1.09087 | 3.44964 | 1.69 | 2.8561 1.30000 | 4.11096 
120 | 14400 1.09545 | 3.4610 | 170 | 2.8900 130384 | 4.12311 
121 | 1464 110000 | 3.47851 | 1.71 2,9241 1.30767 | 4.13521 
1.22 14884 1.10454" | 3.49285 132 2.0584 131149 | 4.14729 
123 | 15129 110905 | 3.50714 | 1.73 | 2.9929 131520 | 4.15933 
124 | 15376 1.11355 | 3.52136 | 1.74 | 3.0276 | 1.31909 | 4.17133 
1.25 | 1.5625 1.11803 | 3.53553 | 1.75 | 3.0625 1.32288 | 4.18330 
126 | 1.5876 1.12250 | 3.54965 | 1.76 | 3.0976 132665 | 4.19524 
127 1.6129 112604 | 3.56371 | 177 | 3.1329 133041 | 420714 
128 | 1.6384 113137 | 3.57771 | 1.78 | 3.1684 | 1.33417 | 4.21900 
129 | 1.6641 113578 | 3.59166 | 179 | 32041 1.33791 | 4.23084 
130 | 1.6900 1.14018 | 3.60555 | 180 | 3.2400 1.34164 | 424264 
131 | 1.7161 1.14455 | 3.61939 | 181 | 3.2761 1.34536 | 4.25441 
132 | 12424 1.14891 | 3.63318 | 1.82 | 3.3124 1.34907 | 4.26615 
133 | 1.7689 115326 | 3.64692 | 1.83 | 3.3489 1.35277 | 4.27785 
134 | 1.7956 115758 | 3.66060 | 1.84 | 3.3856 1.35647 | 4.28952 
135 | 1.8225 116190 | 3.67423 | 185 | 3.4225 136015 | 4.30116 
136 | 1.8496 1.16619 | 3.68782 | 1.86 | 3.4596 1.36382 | 4.31277 
1.37 1.8769 11747 | 3.70135 | 187 | 3.4969 1.36748 | 4.32435 
1.38 | 1.9044 1.17473 | 3.71484 | 138 | 3.5344 137113 | 4.33590 
139 | 19321 1.17893 | 3.72827 | 189 | 3.5721 1.37477 | 4.34741 
140 | 1.9600 118322 | 374166 | 190 | 3.6100 1.37840 | 4.35890 
141 1.9881 118743 | 3.75500 | 191 | 3.6481 1.38203 | 4.37035 
142 | 20164 1.19164 | 3.76829 | 1.92 | 3.6864 1.38564 | 4.38178 
143 | 2.0449 119583 | 3.78153 | 1.93 | 3.7249 138924 | 4.39318 
144 | 20736 120000 | 3.79473 | 194 | 3.7636 1.39284 | 4.40454 
145 | 2.1025 120416 | 3.80789 | 1.05 | 3.8025 1.39642 | 441588 
146 | 2.1316 120830 | 3.82009 | 1.96 | 3.8416 140000 | 442719 
147 2.1609 121244 | 3.83406 | 1.97 | 3.8809 140357 | 4.43847 
148 | 2.1904 121655 | 3.84708 | 1.98 | 3.9204 140712 | 4.44972 
149 | 2.2201 122066 | 386005 | 1.99 | 3.9601 141067 | 446094 
150 | 22500 122474 | 387298 | 2.00 | 40000 | 141421 | 4.47214 
т Ул vion n ГЫ Vn Vion 


TABLE I. (Continued) 
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1.41421 
1.41774 
1.42127 
1.42478 
1.42829 


1.43178 
1.43527 
1.43875 
1.44222 
1.44568 


1.44914 
1.45258 
1.45602 
1.45945 
1.46287 


1.46629 
1.46969 
1.47309 
1.47648 
1.47986 


1.48324 
1.48661 
1.48997 
1.49332 
1.49666 


1.50000 
1.50333 
1.50665 
1.50997 
1.51327 


1.51658 
1.51987 
1.52315 
1.52643 
1.52971 


1.53297 
1.53623 
1.53948 
1.54272 
1.54596 


1.54919 
1.55242 
1.55563 
1.55885 
1.56205 


1.56525 
1.56844 
1.57162 
1.57480 
1.57797 


1.58114 


4.47214 
4.48330 
4.49444 
4.50555 
4.51664 


4.52769 
4.53872 
4.54973 
4.56070 
4.57165 


4.58258 
4.59347 
4.60435 
4.61519 
4.62601 


4.63681 
4.64758 
4.65833 
4.66905 
4.67974 


4.69042 
4.70106 
4.71169 
4.72229 
4.73286 


4.74342 
4.75395 
4.76445 
4.77493 
4.78539 


4.79583 
4.80625 
4.81664 
4.82701 
4.83735 


4.84768 
4.85798 
4.86826 
4.87852 
4.88876 


4.89898 
4.90918 
4.91935 
4.92950 
4.93964 


4.94975 
4.95984 
4.96991 
4.97996 
4.98999 


5.00000 


1.58114 
1.58430 
1.58745 
1.59060 
1.59374 


1.59687 
1.60000 
1.60312 
1.60624 
1.60935 


1.61245 
1.61555 
1.61864 
1.62173 
1.62481 


1.62788 
1.63095 
1.63401 
1.63707 
1.64012 


1.64317 
1.64621 
1.64924 
1.65227 
1.65529 


1.65831 
1.66132 
1.66433 
1.66733 
1.67033 


1.67332 
1.67631 
1.67929 
1.68226 
1.68523 


1.68819 
1.69115 
1.69411 
1.69706 
1.70000 


1.70294 
1.70587 
1.70880 
1.71172 
1.71464 


1.71756 
1.72047 
1.72337 
1.72627 
1.72916 


1.73205 


5.00000 
5.00999 
5.01996 
5.02991 
5.03984 


5.04975 
5.05964 
5.06952 
5.07937 
5.08920 


5.09902 
5.10882 
5.11859 
5.12835 
5.13809 


5.14782 
5.15752 
5.16720 
5.17687 
5.18652 


5.19615 
5.20577 
5.21536 
5.22494 
5.23450 


5.24404 
5.25357 
5.26308 
5.27257 
5.28205 


5.29150 
5.30094 
5.31037 
5.31977 
5.32917 


5.33854 
5.34790 
5.35724 
5.36656 
5.37587 


5.38516 
5.39444 
5.40370 
5.41295 
5.42218 


5.43139 
5.44059 
5.44977 
5.45894 
5.46809 


5.47723 
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TABLE I. (Continued) 


Vion 


п n 


У | “а | 


5 9.0000 
3.01 9.0601 
3.02 9.1204 
3.03 9.1809 
3.04 9.2416 


3.05 9.3025 
3.06 9.3636 


3.08 9.4864 
3.09 9.5481 


9.6100 


9.8596 


341 9.6721 1.76352 5.57674 
3.12 9.7344 1.76635 5.58570 
343 9.7969 1.76918 5.59464 


9.9225 
3.16 9.9856 
3.17 10.0489 
3.18 10.1124 
3.19 10.1761 


320 10.2400 
321 10.3041 
3.22 10.3684 


3.36 11.2896 
3.37 11.3569 
3.38 11.4244 


11.4921 


[ 11.5600 
3.41 11.6281 
3.42 11.6964 
3.43 11.7649 
11.8336 


11.9025 
3.46 11.9716 
3.47 12.0409 
3.48 12.1104 
12.1801 


12.2500 


1.73205 
1.73494 5.48635 
1.73781 5.49545 
1.74069 5.50454 
1.74356 5.51362 


1.74642 5.52268 
1.74929 5.53173 
3.07 9.4249 1.75214 5.54076 
1.75499 5.54977 
1.75784 5.55878 


1.76068 


1.77200 


1.77482 
1.77764 5.62139 
1.78045 5.63028 
1.78326 5.63915 
1.78606 5.64801 


1.78885 5.65685 
1.79165 5.66569 
1.79444 5.67450 
3.23 10.4329 1.79722 5.68331 
10.4976 


Fy 10.5625 
3.26 10.6276 
3,27 10.0929 
3.28 10.7584 
10.8241 


10.8900 
3.31 10.9561 
3.32 11.0224 
3.33 11.0889 
11.1556 


11.2225 


1.80000 


1.80278 
1.80555 5.70964 
1.80831 5.71839 
1.81108 5.72713 
1.81384 


1.81659 
1.81934 5.75326 
1.82209 5.76194. 
1.82483 5.77062 
1.82757 


1.83030 
1.83303 5.79655 
1.83576 5.80517 
1.83848 5.81378 
1.84120 


1.84391 
1.84662 5.83952 
1.84932 5.84808 
1.85203 5.85662 
1.85472 


1.85742 
1.86011 5.88218 
1.86279 5.89067 
1.86548 5.89915 
1.86815 


1.87083 


5.47723 


5.56776 


5.60357 
5.61249 


5.69210 
5.70088 


5.73585 
5.74456 


5.77927 
5.78792 


5.82237 
5.83095 


5.86515 
5.87367 


5.90762 
5.91608 


3.50 12,2500 1.87083 5.91608 
3.51 12.3201 1.87350 5.92453 
3.52 12.3904 1.87617 5.93296 
3.53 12.4609 1.87883 5.94138 
3.54 12.5316 1.88149 5.94979 


3.55 12.6025 1.88414 5.95819 
3.56 12.6736 1.88680 5.96657 
3.57 12.7449 1.88944 5.97495 
3.58 12.8164 1.89209 5.98331 
3.59 12.8881 1.89473 5.99166 


3.60 12.9600 1.89737 6.00000 
3.61 13.0321 1.90000 6.00833 
3.62 13.1044 1.90263 6.01664 
3.63 13.1769 1.90526 6.02495 
3.64 13.2496 1.90788 6.03324 


3.65 13.3225 1.91050 6.04152 
3.66 13.3956 1.91311 6.04979 
3.67 13.4689 1.91572 6.05805 
3.68 13.5424 1.91833 6.06630 
3.69 13.6161 1.92094 6.07454 


зло 13.6900 1.92354 6.08276 


3.71 13.7641 1.92614 6.09098 
3.12 13.8384 1.92873 6.09918 
3.73 13.9129 1.93132 6.10737 
3.74 13.9876 1,93391 6.11555 


3.75 14.0625 1.93649 6.12372 
3.76 14.1376 1.93907 6.13188 
3.77 14.2129 1.94165 6.14003 
3.78 14.2884 1.94422 6.14817 
3.79 14.3641 1.94679 6.15630 


3.80 14.4400 1.94936 6.16441 
3.81 14.5161 1.95192 6.17252 
3.82 14.5924 1.95448 6.18061 
3.83 14.6689 1.95704 6.18870 
3.84 14.7456 1.95959 6.19677 


3.85 14.8225 1.96214 6.20484 
3.86 14.8996 1.96469 6.21289 
3.87 14.9769 1.96723 6.22093 
3.88 15.0544 1.96977 6.22896 
3.89 15.1321 1.97231 6.23699 


3.90 15.2100 1.97484 6.24500 
3.91 15.2881 1.97737 6.25300 
3.92 15.3664 1.97990 6.26099 
3.93 15.4449 1.98242 6.26897 
3.94 15.5236 1.98494 6.27694 


3.95 15.6025 1.98746 6.28490 
3.96 15.6816 1.98997 6.29285 
3.97 15.7609 1.99249 6.30079 
3.98 15.8408 1.99499 6.30872 
15.9201 


16.0000 


1.99750. 
2.00000 


6.31664 
6.32456 
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n 
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Уп 
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16.0000 
16.0801 
16.1604 
16.2409 
16.3216 


16.4025 
16.4836 
16.5649 
16.6464 
16.7281 


16.8100 
16.8921 
16.9744 
17.0569 
17.1396 


17.2225 
17.3056 
17.3889 
17.4724 
17.5561 


17.6400 
17.7241 
17.8084 
17.8929 
17.9776 


18.0625 
18.1476 
18.2329 
18.3184 
18.4041 


18.4900 
18.5761 
18.6624 
18.7489 
18.8356 


18.9225 
19.0096 
19.0969 
19.1844 
19.2721 


19.3600 
19.4481 
19.5364 
19.6249 
19.7136 


19.8025 
19.8916 
19.9809 
20.0704 
20.1601 


20.2500 


2.01246 
2.01494 
2.01742 
2.01990 
2.02237 


2.02485 
2.02731 
2.02978 
2.03224 
2.03470 


2.03715 
2.03961 
2.04206 
2.04450 
2.04695 


2.04939 
2.05183 
2.05426 
2.05670 
2.05913 


2.06155 
2.06398 
2.06640 
2.06882 
2.07123 


2.07364 
2.07605 
2.07846 
2.08087 
2.08327 


2.08567 
2.08806 
2.09045 
2.09284 
2.09523 


2.09762 
2.10000 
2.10238 
2.10476 
2.10713 


2.10950 
2.11187 
2.11424 
2.11660 
2.11896 


2.12132 


6.32456 
6.33246 
6.34035 
6.34823 
6.35610 


6.36396 
6.37181 
6.37966 
6.38749 
6.39531 


6.40312 
6.41093 
6.41872 
6.42651 
6.43428 


6.44205 
6.44981 
6.45755 
6.46529 
6.47302 


6.48074 
6.48845 
6.49615 
6.50384 
6.51153 


6.51920 
6.52687 
6.53452 
6.54217 
6.54981 


6.55744 
6.56506 
6.57267 
6.58027 
6.58787 


6.59545 
6.60303 
6.61060 
6.61816 
6.62571 


6.63325 
6.64078 
6.64831 
6.65582 
6.66333 


6.67083 
6.67832 
6.68581 
6.69328 
6.70075 


6.70820 


20.2500 
20.3401 
20.4304 
20.5209 
20.6116 


20.7025 
20.7936 
20.8849 
20.9764 
21.0681 


21.1600 
21.2521 
21.3444 
21.4369 
21.5296 


21.6225 
21.7156 
21.8089 
21.9024 
21.9961 


22.0900 
22.1841 
22.2784 
22.3729 
22.4676 


22.5625 
22.6576 
22.1529 
22.3484 
22.9441 


23.0400 
23.1361 
23.2324 
23.3289 
23.4256 


23.5225 
23.6196 
23.7169 
23.8144 
23.9121 


24.0100 
24.1081 
24.2064 
24.3049 
24.4036 


24.5025 
24.6016 
24.7009 
24.8004 
24.9001 


25.0000 


2.12132 
2.12368 
2.12603 
2.12838 
2.13073 


2.13307 
2.13542 
2.13776 
2.14009 
2.14243 


2.14476 
2.14709 
2.14942 
2.5174 
2.15407 


2.15639 
2.15870 
2.16102 
2.16333 
2.16564 


2.16795 
2.17025 
2.17256 
2.17486 
2.17715 


2.17945 
2.18174 
2.18403 
2.18632 
2.18861 


2.19089 
2.19317 
2.19545 
2.19773 
2.20000 


2.20227 
2.20454 
2.20681 
2.20907 
2.21133 


2.21359 
2.21585 
2.21811 
2.22036 
2.22261 


2.22486 
2.22711 
2.22935 
2.23159 
2.23383 


2.23607 


6.70820 
6.71565 
6.72309 
6.73053 
6.73795 


6.74537 
6.75278 
6.76018 
6.76757 
6.77495 


6.78233 
6.78970 
6.79706 
6.80441 
6.81175 


6.81909 
6.82642 
6.83374 
6.84105 
6.84836 


6.85565 
6.86294 
6.87023 
6.87750 
6.88477 


6.89202 
6.89928 
6.90652 
6.91375 
6.92098 


6.92820 
6.93542 
6.94262 
6.94982 
6.95701 


6.96419 
6.97137 
6.97854 
6.98570 
6.99285 


7.00000 
7.00714 
7.01427 
7.02140 
7.02851 


7.03562 
7.04273 
7.04982 
7.05691 
7.06399 


7.07107 
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TABLE I. (Continued) 


Уп 


У10п 


2.34521 
2.34734 
2.34947 
2.35160 
2.35372 


2.35584 
2.35797 
2.36008 
2.36220 
2.36432 


2.36643 
2.36854 
2.37065 
2.37276 
2.37487 


2.31697 
2.37908 
2.38118 
2.38328 
2.38537 


2.38747 
2.38956 
2.39165 
2.39374 
2.39583 


2.39792 
2.40000 
2.40208 
2.40416 
2.40624 


2.40832 
2.41039 
2.41247 
2.41454 
2.41661 


2.41868 
2.42074 
2.42281 
2.42487 
2.42693 


2.42899 
2.43105 
2.43311 
2.43516 
2.43121 


2.43926 
2.44131 
2.44336 
2.44540 
2.44745 


2.44949 


7.41620 
7.42294 
7.42967 
7.43640 
7.44312 


7.44983 
7.45654 
7.46324 
7.46994 
7.47663 


7.48331 
7.48999 
7.49667 
7.50333 
7.50999 


7.51665 
7.52330 
7.52994 
7.53658 
7.54321 


7.54983 
7.55645 
7.56307 
7.56968 
7.57628 


7.58288 
7.58947 
7.59605 
7.60263 
7.60920. 


7.61577 
7.62234 
7.62889 
7.63544 
7.64199 


7.64853 
7.65506 
7.66159 
7.66812 
7.67463 


7.68115 
7.68765 
1.69415 
7.70065 
7.70714 


7.11362 
7.72010 
7.72658 
7.13305 
7.13951 


7.74597 


п? Уп Vion п? 

4 25.0000 | 2.23607 | 7.07107 Ў 30.2500 
5.01 | 251001 | 2.23830 | 7.07814 | 5.51 | 30.3601 
502 | 252004 | 2.24054 | 7.08520 | 5.52 | 30.4704 
503 | 253009 | 2.24277 | 7.09225 | 5.53 | 30.5809 
5.04 | 254016 | 2.24499 | 7.00930 | 5.54 | 30.6916 
505 | 25.5025 | 2.24722 | 710634 | 5.55 | 30.8025 
5.06 25.6036 2.24944 7.11337 5.56 30.9136 
507 | 25.7049 | 2.25167 | 7.12039 | 557 | 31.0249 
508 | 258064 | 2.25389 | 7.12741 | 5.58 | 31.1364 

25.9081 | 2.25610 | 7.13442 | 5.59 | 31.2481 
510 | 26.0100 | 2.25832 | 7.14143 | 5.60 | 31.3600 
sil | 261121 | 2.26053 | 7.14843 | 5.61 | 314721 
$12 | 262144 | 2.26274 | 715542 | 5.62 | 31.5844 
513 | 263160 | 226405 | 7.16240 | 5.63 | 31.6969 
514 | 26.4196 | 2.26716 | 7.16938 | 5.64 | 31.8096 
5.15 26.5225 2.26936 7.17635 5.65 31.9225 
516 | 26.6256 | 2.27156 | 718331 | 5.66 | 32.0356 
517 | 26.7289 | 2.27376 | 7.19027 | 5.67 | 32.1489 
518 | 26.8324 | 2.27596 | 7.19722 | 5.68 | 32.2624 
269361 | 227816 | 72047 | 5.69 | 32.3761 
27.0400 | 2.28035 | 7.21110 | 570 | 32.4900 
521 | 271441 | 2.28254 | 7.21803 | 5.71 | 32.6041 
5.22 | 27.2484 | 228473 | 7.22496 | 572 | 32.1184 
523 | 273529 | 2.28602 | 7.23187 | 573 | 32.8329 
274576 | 2.28910 | 7.23878 | 574 | 32.9476 
27.5625 | 2.29129 | 7.24569 | 5.75 | 33.0625 
526 | 27.6676 | 2.29347 | 7.25259 | 576 | 33.1776 
$27 | 27.7729 | 2.29565 | 7.25948 | 5.77 | 33.2929 
528 | 27.8784 | 2.29783 | 7.26636 | 5.78 | 33.4084 
27.9841 | 2.30000 | 7.27324 | 5.79 | 33.5241 
28.0900 | 2.30217 | 7.28011 | 580 | 33.6400 
531 | 28.1961 | 2.30434 | 7.28697 | 5.81 | 33.1561 
532 | 283024 | 230651 | 7.29383 | 5.82 | 33.8724 
533 | 28.4089 | 2.30868 | 7.30068 | 5.83 | 33.9889 
28.5156 | 2.31084 | 7.30753 | 5.84 | 34.1056 
28.6225 | 2.31301 | 7.31437 | 5.85 | 34.2225 
536 | 28.7296 | 2.31517 | 7.32120 | 586 | 34.3396 
537 | 288369 | 2.31733 | 7.32803 | 5.87 | 34.4569 
538 | 28.9444 | 2.31948 | 7.33485 | 5.88 | 34.5744 
290521 | 2.32164 | 7.34166 | 5.89 | 34.6921 

) 29.1600 | 2.32379 | 7.34847 | 590 | 34.8100 
541 | 29.2681 | 2.32594 | 7.35527 | 591 | 34.9281 
5.42 | 29.3764 | 2.32809 | 7.36206 | 5.92 | 35.0464 
543 | 29.4849 | 2.33024 | 7.36885 | 5.93 | 35.1649 

. 29.5936 | 2.33238 | 7.37564 | 5.94 | 35.2836 

29.1025 | 2.33452 | 7.38241 | 595 | 35.4025 
546 | 29.8116 | 2.33666 | 7.38918 | 596 | 35.5216 
547 | 29.9209 | 2.33880 | 7.39594 | 5.97 | 35.6409 
548 | 30.0304 | 2.34094 | 7.40270 | 5.98 | 35.7604 
549 | 30.1401 | 2.34307 | 7.40945 | 5.99 | 35.8801 
30.2500 2.34521 7.41620 36.0000 

nt Уп V10n n nt 
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TABLE I. (Continued) 


nt Vn Vion n п? Уп Vion 

1 36.0000 2.44949 7.14597 6.50 42.2500 2.54951 8.06226 
6.01 36.1201 2.45153 7.75242 6.51 42.3801 2.55147 8.06846 
6.02 36.2404 2.45357 7.75887 6.52 42.5104 2.55343 8.07465 
603 | 363609 | 245561 | 7.76531 | 6.53 | 42.6409 | 2.55539 | 8.08084 
6.04 36.4816 2.45764 7.17174 6.54 42.1116 2.55734. 8.08703 
6.05 36.6025 2.45967 7.77817 6.55 42.9025 2.55930 8.09321 
606 | 367236 | 246171 | 7.78460 | 656 | 43.0336 | 2.56125 | 8.00938 
6.07 36.8449 2.46374 7.19102 6.57 43.1649 2.56320 8.10555 
6.08 36.9664 2.46577 7.19744 6.58 43.2964 2.56515 8.11172 
609 | 370881 | 2.46779 | 7.80385 | 6.59 | 434281 | 2.56710 | 8.11788 
610 | 372100 | 246082 | 7.81025 | 660 | 43.5600 | 2.56905 | 8.12404 
6.11 37.3321 2.47184 7.81665 6.61 43.6921 2.57099 8.13019 
6.12 37.4544 2.47386 7.82304 6.62 43.8244 2.57294 8.13634 
613 | 375769 | 247588 | 7.82043 | 663 | 43.9569 | 2.57488 | 8.14248 
6.14 37.6996 2.47790 7.83582 6.64 44.0896 2.57682 8.14862 
6.15 37.8225 2.47992 7.84219 6.65 44.2225 2.57876 8.15475 
6.16 37.9456 2.48193 7.84857 6.66 44.3556 2.58070 8.16088 
617 | 380689 | 248395 | 7.85493 | 667 | 44.4889 | 2.58263 | 8.16701 
6.18 38.1924 2.48596 7.86130 6.68 44.6224 2.58457 8.17313 
6.19 38.3161 2.48797 7.86766 6.69 44.7561 2.58650 8.17924 
620 38.4400 2.48998 7.87401 6.70 44.8900 2.58844 8.18535 
6.21 38.5641 2.49199 7.88036 671 45.0241 2.59037 8.19146 
6.22 38.6884 2.49399 7.88670 6.72 45.1584 2.59230 8.19756 
6.23 38.8129 2.49600 7.89303 6.73 45.2929 2.59422 8.20366 
6.24 38.9376 2.49800 7.89937 6.74 45.4276 2.59615 8.20975 
6.25 39.0625 2.50000 7.90569 6.75 45.5625 2.59808 8.21584 
6.26 39.1876 2.50200 7.91202 6.76 45.6976 2.60000 8.22192 
6.27 39.3129 2.50400 7.91833 6.77 45.8329 2.60192 8.22800 
6.28 39.4384 2.50599 7.92465 6.78 45.9684 2.60384 8.23408 
6.29 39.5641 2.50799 7.93095 6.79 46.1041 2.60576 8.24015 
630 39.6900 2.50998 7.93725 6.80 46.2400 2.60768. 8.24621 
6,31 39.8161 2.51197 7.94355 6.81 46.3761 2.60960 8.25227 
6.32 39.9424 2.51396 7.94984 6.82 46.5124 2.61151 8.25833 
6.33 40.0689 2.51595 7.95613 6.83 46.6489 2.61343 8.26438 
6.34 40.1956 2.51794 7.96241 6.84 46.7856 2.61534 8.27043 
6.35 40.3225 2.51992 7.96869. 6.85 46.9225 2.61725 8.27647 
636 | 404496 | 2.52190 | 7.97496 | 6.86 | 47.0506 | 2.61916 8.28251 
6.37 40.5769 2.52389 7.98123 6.87 47.1969 2.62167 8.28855 
6.38 40.7044 2.52587 7.98749 6.88 47,3344 2.62298 8.29458 
6.39 40.8321 2.52784 7.99375 6.89 47.4721 2.62488 8.30060 
6.40 40,9600 2.52982 8.00000 6.90 47.6100 2.62679 8.30662 
6.41 41.0881 2.53180 8.00625 6.91 47.7481 2.62869 8.31264 
642 41.2164 2.53377 8.01249 6.92 47.8864 2.63059 8.31865 
6.43 41.3449 2.53574 8.01873 6.93 48.0249 2.63249 8.32466 
6.44 41,4736 2.53772 8.02496 6.94 48.1636 2.63439 8.33067 
6.45 41.6025 2.53969 8.03119 6.95 48.3025 2.63629 8.33667 
ы 41.7316 | 2.54165 | 803741 | 696 | 484416 | 2.63818 | 8.34266 
41.8609 2.54362 8.04363 | 697 | 485809 | 2.64008 8.34865 
41.9904 2.54558. 8.04984 6.98 48,7204 2.64197 8.35464 
42,1201 2.54755 8.05605 6.99 48.8601 2.64386 8.36062 
42.2500 2.54951 8.06226 7.00 49.0000 2.64575 8.36660 

т Ул Vion n nt Vn Vion 


575 


ТАВГЕ Г. (Сопипие4) 


| п? Мп Vion n п? 
7.50 


7.00 49.0000 2.64575 8.36660 56.2500 2.73861 8.66025 
7.01 49.1401 2.64764 8.37257 7.51 56.4001 2.14044 8.66603 
7.02 49.2804 2.64953 8.37854 7.52 56.5504 2.74226 8.67179 
7.03 49.4209 2.65141 8.38451 7.53 56.7009 2.74408 8.67756 
7.04 49.5616 2.65330 8.39047 7.54 56,8516 2.74591 8.68332 


7.05 49.7025 2.65518 8.39643 7.55 51.0025 2.74773 8.68907 
7.06 49.8436 2.65707 8.40238. 7.56 51.1536 2.14955 8.69483 
7.07 49.9849 2.65895 8.40833 7.57 57.3049 2.75136 8.70057 
7.08 50.1264 2.66083 8.41427 7.58 57.4564 2.75318. 8.70632 
7.09 50.2681 2.66271 8.42021 7.59 57.6081 2.75500 8.71206 


710 50.4100 2.66458 8.42615 7.60 57.7600 2.75681 8.71780 
TAL 50.5521 2.66646 8.43208 7.61 57.9121 2.15862 8.72353 
7.12 50.6944. 2.66833 8.43801 7.62 58.0644 2.76043 8.72926 
7.13 50.8369 2.67021 8.44393 7.63 58.2169 2.76225 8.73499 
1.14 50.9796. 2.67208 8.44985 7.64 58.3696 2.76405 8.74071 


7.15 51.1225 2.67395 8.45577 7.65 58.5225 2.76586 8.74645 
7.16. 51.2656 2.67582 8.46168 7.66 58.6756 2.16767 8.75214 
TAT 51.4089 2.67769 8.46759 7.67 58.8289 2.76948 8.75785 
7.18 51.5524 2.67955 8.47349 7.68 58.9824 2.77128 8.76356 
7.19 51.6961 2.68142 8.47939 7.69 59.1361 2.77308. 8.76926 


720 51.8400. 2.68328 8.48528 770 59.2900 2.77489 8.77496 
721 51.9841 2.68514 8.49117 7.71 59.4441 2.77669 8.78066 
7.2 52.1284 2.68701 8.49706 732 59.5984 2.77849 8.78635, 
7.23 52.2729 2.68887 8.50294 1.13 59.7529 2.78029 8.79204 
7.24 52.4176 2.69072 8.50882 7.14 59.9076 2.18209 8.79773 


52.5625 2.69258 8.51469 7.15 60.0625. 2.78388 8.80341 
52.7076. 2.69444 8.52056 7.16 60.2176 2.78568 8.80909 
52.8529 2.69629 8.52643 777 60.3729 2.78747 8.81476 
52.9984 2.69815 8.53229 778 60.5284 2.78927 8.82043 
53.1441 2.70000 8.53815 60.6841 2.79106 8.82610 


53,2900 2.70185 8.54400 60.8400 2.19285 8.83176 
53.4361 2.70370 8.54985 7.81 60.9961 2.19464 8.83742 
53.5824 2.70555 8.55570. 7.82 61.1524 2.19643 8.84308 
53.7289 2.10740 8.56154 1.83 61.3089 2.79821 8.84873 
53.8756 270924 8.56738 61.4656 2.80000 8.85438 


54.0225 2,71109 8.57321 Ё 61.6225 2.80179 8.86002 
54.1696 2.71293 8.57904 7.86 61.7796 2.80357 8.86566 
54.3169 2.71477 8.58487 7.87 61.9369 2.80535 8.87130 
54.4644 2.71662 8.59069. 7.88 62.0944 2.80713 8.87694 
54.6121 2.71846 8.59651 62.2521 2.80891 8.88257 


54.7600 2.12029 8.60233 E 62.4100 2.81069 8.88819 
54.9081 2.72213 8.60814 7.91 62.5681 2.81247 8.89382 
55.0564 2.72397 8.61394 7.92 62.7264 2.81425 8.89944 
55.2049 2.72580. 8.61974 7.93 62.8849 2.81603 8.90505 
55.3536 2.72764 8.62554 2.81780 8.91067 


55.5025 2.72947 8.63134 2.81957 "| 8.91628 
55.6516 2.73130 8.63713 7.96 93.3616 2.82135 8.92188. 
55.8009 2.13313 8.64292 7.97 63.5209 2.82312 8.92749 
55.9504 2.73496 8.61870 7.98 63.6804 2.82489 8.93308 
56.1001 2.73679 8.65448. 2.82666 8.93868. 


56.2500 2,73861 8.66025 2.82843 8.94427 


т Уа Vi0n Vion 


Vn Vion 


| 
| 
| 


TABLE I. (Continued) 


nt Vn V10n n nm Vn Vion 
64.0000 2.82843 8.94427 8.50 72.2500 2.91548 9.21954 
64.1601 2.83019 8.94986 8.51 72.4201 2.91719 9.22497 
64.3204 2.83196 8.95545 8.52 72.5904 2.91890 9.23038 
64.4809 2.83373 8.96103 8.53 72.7609 2.92062 9.23580. 
64.6416 2.83549 8.96660 8.54 72.9316 2.92233 9.24121 
64.8025 2.83725 8.97218 8.55 73.1025 2.92404 9.24662 
64.9636 2.83901 8.97775 8.56 73.2736 2.92575 9.25203 
65.1249 2.84077 8.98332 8.57 73.4449 2.92146 9.25743 
65.2864 2.84253 8.98888 8.58 73.6164 2.92916 9.26283 
65.4481 2.84429 8.99444 8.59 73.1881 2.93087 9.26823 
65.6100 2.84605 9.00000 8.60 73.9600 2.93258 9.27362 
65.7721 2.84781 9.00555 8.61 74.1321 2.93428 9.27901 
65.9344 2.84956 9.01110 8.62 74.3044 2.93598 9.28440 
66.0969 2.85132 9.01665 8.63 74.4769 2.93769 9.28978 
66.2596 2.85307 9.02219 8.64 74.6496 2.93939 9.29516 
66.4225 2.85482 9.02774 8.65 74.8225 2.94109 9.30054 
66.5856 2.85657 9.03327 8.66 74.9956 2.94279 9.30591 
66.7489 2.85832 9.03881 8.67 75.1689 2.94449 9.31128 
66.9124 2.36007 9.04434 8.68 75.3424 2.94618 9.31665 
67.0761 2.86182 9.04986 8.69 75.5161 2.94788 9.32202 
67.2400 2.86356 9.05539 8.70 75.6900 2.94958 9.32738 
67.4041 2.86531 9.06091 8.71 75.8641 2.95127 9.33274 
67.5684 2.86705 9.06642 8.72 76.0384 2.95296 9.33809 
67.7329 2.86880 9.07193 8.73 76.2129 2.95466 9.34345 
67.8976 2.87054 9.07744 8.74 76.3876 2.95635 9.34880 
68.0625 2.87228 9.08295 8.75 76.5625 2.95804 9.35414 
68.2276 2.87402 9.08845 8.76 76.1376 2.95973 9.35949 
68,3929 2.87576 9.09395 9.77 76.9129 2.96142 9.36483 
68.5584 2.87750 9.09945 8.78 77.0884 2.96311 9.37017 
68.7241 2.87924 9.10494 8.79 11.2641 2.96479 9.37550 
68.8900 2.88097 9.11043 8.80 77.4400 2.96648 9.38083 
69.0561 2.88271 9.11592 8.81 77.6161 2.96816 9.38616 
69.2224 2.88444 9.12140 8.82 77.7924 2.96985 9.39149 
69.3889 2.88617 9.12688 8.83 77.9689 2.97153 9.39681 
69.5556 2.88791 9.13236 8.84 78.1456 2.97321 9.40213 
69.7225 2.88964 9.13783 8.85 78.3225 2.97489 9.40744 
69.8896. 2.89137 9.14330 8.86 78.4996 2.97658 9.41276 
70.0569 2.89310 9.14877 8.37 78.6769 2.97825 9.41807 
70.2244 2.89482 9.15423 8.88 78 8544 2.97993 9.42338 
70.3921 2.89655 9.15969 8.89 79.0321 2.98161 9.42868 
70.5600 2.89828 9.16515 8.90 79.2100 2.98329 9.43398 
70.7281 2.90000 9.17061 8.91 79.3881 2.98496 9.43928 
70,8964 2.90172 9.17606 8.92 79.5664 2.98664 9.44458 
71.0649 2.90345 9.18150 8.93 79.7449 2.98831 9.44987 
71.2336 2.90517 9.18695 8.94 79.9236 2.98998 9.45516 
71.4025 2.90689 9.19239 8.95 80.1025 2.99166 9.46044 
71.5716 2.90861 9.19783 8.96 80.2816 2.99333 9.46573 
71.7409 2.91033 9.20326 8.97 80.4609 2.99500 9.47101 
71.9104 2.91204 9.20869 8.98 80.6404 2.99666 9.47629 
72.0801 2.91376 9.21412 8.99 80.8201 2.99833 9.48156 
12.2500 2.91548 9.21954 9.00 81.0000 3.00000 9.48683 

nt Уп Vion | п т Уа Vion 


577 


578 


TABLE I. (Continued) 


У10л п n Vn У10л 
81.0000 3.00000 9.48683 9.50 90.2500 3.08221 9.14679 
81.1801 3.00167 9.49210 9.51 90.4401 3.08383 9.15192 
81.3604 3.00333 9.49737 9.52 90.6304. 3.08545 9.15705 
81.5409 3.00500 9.50263 9.53 90.8209 3.08707 9.16217 
81.7216 3.00666 9.50789 9.54 91.0116 3.08869 9.76729 
81.9025 3.00832 9.51315 9.55 91.2025 3.09031 9.17241 
82.0836 3.00998 9.51840 9.56 91.3936 3.09192 9.71753 
82.2649 3.01164 9.52365 9.57 91.5849 3.09354 9.78264 
82.4464 3.01330 9.52890 9.58 91.7764 3.09516 9.18775 
82.6281 3.01496 9.53415 9.59 91.9681 3.09677 9.79285 
82.8100 3.01662 9.53939 9.60 92.1600 3.09839 9.79796 
82.9921 3.01828 9.54463 9.61 92.3521 3.10000 9.80306 
83.1744 3.01993 9.54987 9.62 92.5444 3.10161 9.80816 
83.3569 3.02159 9.55510 9.63 92.1369 3.10322 9.81326 
83.5396 3.02324 9.56033 9.64 92.9296 3.10483 9.81835 
83.7225 3.02490 9.56556 9.65 93.1225 3.10644 9.82344 
83.9056 3.02655 9.57079 9.66 93.3156 3.10805 9.82853 
84.0889 3.02820 9.57601 9.67 93.5089 3.10966 9.83362 
84.2724 3.02985 9.58123 9.68 93.1024 3.11127 9.83870 
84.4561 3.03150 9.58645 9.69 93.8961 3.11288 9.84378 
84.6400 3.03315 9.59166 9.70 94,0900 3.11448 9.84886 
84.8241 3.03480 9.59687 9,71 94.2841 3.11609 9.85393 
85.0084 3.03645 9.60208. 9.72 94,4784 3.11769 9.85901 
85.1929 3.03809 9.60729 9.73 94.6729 3.11929 9.86408 
85.3776 3.03974 9.61249 974 94.8676 3.12090 9.86914 
85.5625 3.04138 9.61769 9.75 95.0625 3.12750 9.87421 
85.7476 3.04302 9.62289 9.76 95.2576 3.12410 9.87927 
85.9329 3.04467 9.62808 9.77 95.4529 3.12570 9.88433 
86.1184 3.04631 9.63328 9.78 95.6484 3.12730 9.88939 
86.3041 3.04795 9.63846 9.79 95.8441 3.12890 9.89444 
86.4900 3.04959 9.64365 9.80 96.0400 3.13050 9,89949 
86.6761 3.05123 9.64883 9.81 96.2361 3.13209 9.90454 
86.8624 3.05287 9.65401 9.82 96.4324 3.13369 9.90959 
87.0489 3.05450 9.65919 9.83 96.6289 3.13528 9.91464 
87.2356 3.05614 9.66437 9.84 96.8256 3.13688 9.91968 
87.4225 3.05778 9.66954 9.85 97.0225 3.13847 9,92472 
87.6096 3.05941 9.67471 9.86 97.2196 3.14006 9.92975 
87.7969 3.06105 9.67988 9.87 97.4169 3.14166 9.93479 
87.0844 | 3.06268 | 9.68504 | 9.88 | 97.6144 | 3.14325 9.93982 
88.1721 3.06431 9.69020 9.89 97.8121 3.14484 9.94485 
88.3600 3.06594 9.69536 9.90 98.0100 3.14643 9.94987 
88.5481 3.06757 9.70052 9.91 98.2081 3.14802 9.95490 
88,7364 3.06920 9.70567 9,92 98.4064 3.14960 9.95992 
88.9249 3.07083 9.71082 9.93 98.6049 3.15119 9.96494 
89.1136 3.07246 9.71597 9.94 98.8036 3.15278 9.96995 
89.3025 3.07409 9.72111 9.95 99.0025 3.15436 9.97497 
89.4916 | 3.07571 | 9.72625 | 9.96 | 99,2016 | 3.15595 | 9.97998 
89.6809 3.07734 9.73139 9.97 99.4009 3.15753 9.98499 
89.8704 3.07896 9.73653 9.98 99.6004 3.15911 9.98999 
90.0601 3.08058 9.74166 9,99 99.8001 3.16070 9.99500 
90.2500 3.08221 9.74679 10.00 100.000 3.16228 10.0000 

ni Yn Vion Vn У10: 


іы 2 


TABLE J 
SQUARES, SQUARE ROOTS, RECIPROCALS: 1-99 


м YN 1N |N м VN 1/N|N № VN им 
1 1.000 1.0000 | 34 1,156 5.831 .0294 | 67 4,489 8.185 .0149 
4 1.414 .5000 | 35 1,225 5.916 .0286 | 68 4,624 8.246 .0147 
9 1.732 3333 | 36 1,296 6.000 .0278 | 69 4,761 8.307 .0145 
16 2.000 .2500 | 37 1,369 6.083 .0270 | 70 4,900 8.367 .0143 
25 2.236 12000 | 38 1,444 6.164 .0263 | 71 5,041 8.426 .0141 
36 2.449 .1667 | 39 1,521 6.245 .0256 | 72 5,184 8.485 .0139 
49 2.646 .1429 | 40 1,600 6.325 .0250 | 73 5,329 8.544 .0137 
64 2.828 .1250 | 41 1,681 6.403 .0244 | 74 5,476 8.602 .0135 
81 3.000 1111 | 42 1,764 6.481 .0238 | 75 5,625 8.660 .0133 

100 3.162 .1000 | 43 1,849 6.557 .0233 | 76 5,776 8.718 .0132 

121 3.317 .0909 | 44 1,936 6.633 .0227 | 77 5,929 8.775 .0130 

144 3.464 .0833 | 45 2,025 6.708 .0222 | 78 6,084 8.832 .0128 

169 3.606 .0769 | 46 2,116 6.782 .0217 | 79 6,241 8.888 .0127 

196 3.742 .0714 | 47 2,209 6.856 .0213 | 80 6,400 8.944 .0125 

225 3.873 .0667 | 48 2,304 6.928 .0208 | 81 6,561 9.000 .0123 

256 4.000 .0625 | 49 2,401 7.000 .0204 | 82 6,724 9.055 .0122 

289 4.123 .0588 | 50 2,500 7.071 .0200 | 83 6,889 9.110 .0120 

324 4.243 .0556 | 51 2,601 7.141 .0196 | 84 7,056 9.165 .0119 

361 4.359 .0526 | 52 2,704 7.211 .0192 | 85 7,225 9.220 .0118 

400 4.472 .0500 | 53 2,809 7.280 .0189 | 86 7,396 9.274 .0116 

441 4.583 .0476 | 54 2,916 7.348 .0185 | 87 7,569 9.327 .0115 

484 4.690 .0455 | 55 3,025 7.416 .0182 | 88 7,744 9.381 .0114 

529 4.796 .0435 | 56 3,136 7.483 .0179 | 89 7,921 9.434 .0112 

576 4.899 .0417 | 57 3,249 7.550 .0175 | 90 8,100 9.487 .0111 

625 5.000 .0400 | 58 3,364 7.616 .0172 | 91 8,281 9.539 .0110 

676 5.099 .0385 | 59 3,481 7.681 .0169 | 92 8,464 9.592 .0109 

729 5.196 .0370 | 60 3,600 7.746 .0167 | 93 8,649 9.644 .0108 

784 5.292 .0357 | 61 3,721 7.810 .0164 | 94 8,836 9.695 .0106 

841 5.385 .0345 | 62 3,844 7.874 .0161 | 95 9,025 9.747 .0105 

900 5.477 .0333 | 63 3,969 7.937 .0159 | 96 9,216 9.798 .0104 

961 5.568 .0323 | 64 4,096 8.000 .0156 | 97 9,409 9.849 .0103 

1,024 5.567 .0312 | 65 4,225 8.062 .0154 | 98 9,604 9.899 .0102 
1,089 5.745  .0303 | 66 4,356 8.124 .0152 | 99 9,801 9.950 .0101 
d icc ECIAM КНИА np spl WOO Н ЗАР ГЕ; 


TABLE К 
FOUR-PLACE COMMON LOGARITHMS 


N 0 1 DECANI 4 5 Гр dns 
To | 0000 0043 0086 0128 0170 0212 0253 0294 
11 | 0414 0453 0492 0531 0569 0607 0645 0682 
12 | 0792 0828 0864 0899 0934 0969 1004 1038 
13 | 1139 1173 1206 1239 1271 1303 1335 1367 
14 | 1461 1492 1523 1553 1584 1614 1644 1673 
15 | 1761 1790 1818 1847 1875 1903 1931 1959 
16 | 2041 2068 2095 2122 2148 2175 2201 2227 
17 | 2304 2330 2355 2380 2405 2430 2455 2480 
18 | 2558 2577 2601 2625 2648 2672 2695 2718 
19 | 2788 2810 2833 2856 2878 2900 2923 2945 
20 | 3010 3032 3054 3075 3096 3118 3139 3160 
21 | 3222 3243 3263 3284 3304 3324 3345 3365 
22 | 3424 3444 3464 3483 3502 3522 3541 
23 | 3617 3636 3655 3674 3602 3711 3729 
24 | 3802 3820 3838 3856 3874 3892 3909 
25 | 3979 3997 4014 4031 4048 4065 1082 
26 | 4150 4166 4183 4200 4216 4232 4249 
27 | 4314 4330 4346 4362 4378 4393 4409 
28 | 4472 4487 4502 4518 4533 4548 4564 
29 | 4624 4639 4654 4669 4683 4698 4713 
30 | 4771 4786 4800 4814 4829 4843 4857 4 
31 | 4914 4928 4942 4955 4969 4983 4997 5011 
32 | 5051 5065 5079 5092 5105 5119 5132 5145 
33 | 5185 5198 5211 5224 5237 5250 5263 5276 
34 | 5315 5328 5340 5353 5366 5318 5391 5403 
35 | 5441 5453 5465 5478 5490 5502 5514 552 
36 | 5563 5575 5587 5599 5611 5693 5635 5647 
37 | 5682 5694 5705 5717 5129 5740 5152 5763 
38 | 5798 5809 5821 5832 5843 5855 5866 5877 
39 | 5911 5922 5933 5944 5955 5966 5977 5988 
40 | 6021 6031 6042 6053 6064 6075 6085 6096 
41 | 6128 6138 6149 6160 6170 6180 6191 6201 
42 | 6232 6243 6253 6263 6274 6284 6294 6304 
43 | 6335 6345 6355 6365 6375 6385 6395 6405 
44 | 6435 6444 6454 6464 6474 6484 6493 6503 
45 | 6532 6542 6551 6561 6571 6580 6590 6599 
46 | 6628 6637 6646 6656 6665 6675 6684 6693 
47 | 6721 6730 6739 6749 6758 6767 6776 6785 
48 | 6812 6821 6830 6839 6848 6857 6866 6875 
49 | 6902 6911 6920 6928 6937 6946 6955 6964 
50 | 6990 6998 7007 7016 7024 17033 7042 7050 
51 | 7076 7084 7093 7101 7110 7118 7126 7135 
52 | 7160 7168 7177 7185 7193 7902 7210 7218 
53 | 7243 7251 7259 7267 1215 7284 7292 7300 
54 | 7324 7332 7340 7348 7356 7364 7372 1380 
580 


TABLE К. (Continued) 


ТАВГЕ Т, 


NATURAL TRIGONOMETRIC FUNCTIONS 


ANGLE SIN cos TAN ANGLE SIN cos TAN 
0° 000 100 .000 15° 7 107 1.000 
ДЕ 018 99 018 46% 1119 . .695 1.036 
29 035 999 035 47° 1131 1.072 
3° 1052 1998 .052 48° 1743 1.111 
4° 1070 .997 .070 49° 1755 1.150 
5° 087 996 .087 50% 2766 1.192 
6° 105 .94 .105 51° 2747 182 
> 102: 790007 5103 52° .788 1. 
8° 139 00.141 53° 1799 1.3 
9° 156 (988 .158 54° 1809 1: 

10% 174 1985 176 55° 1819 + 
11° .191 .982 194 56? .829 1.4 
12° 1908 2978 213 57% 1829 1. 
13° 1255.1. 5974 .231 58° -848 1. 

14% 42 2970 2249 59% 1857 НА 

15° 1250 .966 .268 60° :866 1. 
16° 176 2961 281 61° 1875  .485 1. 

We 292 1956 2306 62° 1883 460 1. 

18° 309 .951 1325 63° 01 454 1. 

19° 326 046 .344 64° 1809 1438 2. 
20% 1342 — 140 .364 65% 6 4938 2.1: 
21° .358 .934 .384 66° ая, 2 
22° :315 9927 1404 61° и 301008 2 

23° 1801. 35:021" 494 68° 1997 2915: 2.4 
24° .407 204 .445 69° 934 1358 2 
25° 423 1906 466 10° 40 342 2. 

26° 438 — .899 .488 71% 2946 2; 

215 144 8041 510 79° 951 3. 
28° 40 883 .532 73° 7956 3. 

29° .485 815 .554 14° 1961 3. 

30° 1500 .866 .577 75% 76 1959 3. 

31° :515 | 2857. 601 16° .970 .242 4. 
32° 1580 8 695 11° 1974 .225 

33° ‘545 839 649 78° 978 208 4. 
34° .559 89 1615 19° 7982 1091 15:1; 
35° 1574 .819 700 80° .985 174 5. 
36% 1588 .809 727 81° .988  .156 6. 
31° .602 . 1790.154 82° .990 139 7. 

389 .616 .788 2781 83° 1002 бото НЕ 
39° 629° О 84° 1994 105 9. 
40° .643 .766 .839 85° 996 .087 11.4 
41° .656  .755 .869 86° .997 .070 14.300 
42° 1660 .74з 900 87° 998 .052 19.081 
43° 1682 71 99 88° 999 035 28.636 
44° .695 (1719 966 89° 999 (018 57.290 
45° 707 _ .707 1.000 90° 1.000  .000 


2a 4 — 


Absolute value, 139 
Acceptance, region of, 384, 405, 416, 
459, 485, 497 
Accuracy 
of approximate numbers and com- 
putations, generally, 24-30 
and percentage error, 27 
and significant digits, 27 
of a statistic (see statistic in 
question) 
Alienation, coefficient of, 294 
Alpha measures of skewness and 
kurtosis 
accuracy, 185 
computation, 181-185 
interpretation and use, 185-187, 
446-447 
standard errors, 447 
table of 90 and 98 per cent sam- 
pling limits, 566 
use of table, 446 
American Psychological Association, 
335, 371 
American Society of Mechanical 
Engineers, 55, 73 
Analysis of covariance, 515 ff. 
advantages, 522, 523-524 
assumptions in, 522 
differences among means, 516-522 
Analysis of variance, 497 ff. 
advantages, 501—502, 512, 523-524 
assumptions in, 501, 507 
correlation ratio, 502-503 
linearity of regression, 503-504 
means in double classification, 
506-513 
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means in single classification, 498— 
502 
multiple coefficient of correlation 
R, 504-505 
multiple R in eliminating variables, 
505-506 
nature of, 497-498, 523 
relation to / test, 501, 505 
reliability of measurements, 513- 
515 
remainder or error variance, 508-509 
Approximate numbers (see Numbers, 
approximate) 
Arbitrary origin, 94 
Area and frequency ( Frequency 
polygon, Histogram, Normal 
curve) 
Arithmetic mean, 91 ff. 
accuracy, 112-113 
computation from 
combined groups, 97 
correlation table, 254 
grouped data, 92-96 
ungrouped data, 91-92 
and confidence limits of popula- 
tion mean, 419-120, 431—132, 
462 
of distribution of differences, 437- 
438, 466 
and effects of 
constant errors, 352 
errors of measurement, 96, 352, 
449 
grouping errors, 93, 149 
relation to median in skewed dis- 
tribution, 178 
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584 


reliability empirical demonstration, 
98-100 
and sample size, 431—432 
sampling distribution, 207-209, 
413-416 
standard error, 415, 427 
population finite, 444 
population stratified, 445 
uses, 96-97, 110 
weighted, 97 
Arithmetic means 
combining, 97 
confidence limits for difference be- 
tween, 435-436, 465, 467 
significance of difference between, 
433-436, 436-437, 463-466, 
466-467, 498-502, 506-513, 
516-522 
Arkin, Н., 74 
Association (see Correlation) 
Attenuation, correction for, 353 
Attributes, 21 
correlation of, 264-265 
sampling of, 394-400 
Authority, in problem solving, 1 
Average, 75 ff. 
accuracy, 112 
limitations, 110, 116-117, 169 
meaning and function, 78 
misuses, 14, 169 
moving, 65 
uses, 110-111 
(see also Arithmetic mean, Geo- 
metric mean, Harmonic mean, 
Median, Mode) 
Average deviation, 138 ff, 
accuracy, 174-175 
computation, 139-140 
relation to quartile and standard 
deviations, 170 
reliability, empirical demonstra- 
tion, 171-173 
standard error, 428 
uses and limitations, 140-141, 169— 
170, 173-174 


Index 


Average deviations, significance of 
difference between, 444 


Bakst, А., 34 
Bartlett, M. S., 465, 487, 525 
Ві and 8: measures of nonnormalit y, 
181n 
Bela coefficients 
accuracy, 312 
computation 
four-variable case, 307, 311 
three-variable case, 304-305, 311 
interpretation, 316-319 
standard errors, 428 
Bias (see Errors, constant) 
Bimodality 
factors underlying, 84 
importance, 81-84 
Binomial expansion 
probabilities from, 396-397 
summing terms of, 400-404, 408 
table of coefficients, 398 
Binomial sampling distribution, 394 (T. 
approximation by normal curve, 
400-404 
thumb-rule for goodness of fit, 
404 
development 
experimental, 397-400 
theoretical, 394-397 
parameters, 397 
uses 
inferences from a proportion, 
404—409 
sign test, 409-412 
Biserial correlation, 245 ff. 
normalized biserial coefficient гь 
computation, 248-249 
Flanagan’s approximation, 363— 
364 
standard error, 428 
point biserial coefficient ry, 
computation, 247-248 
formula, derivation of, 531-532 
significance, 469 
uses, 246-247, 249, 362-365 


Index 


Bivariate data, 230 

Bivariate frequency distribution, 251 
Brigham, С. C., 371 

Brinton, У. С., 7: 

Burgess, R. W., 188 

Burke, C. J., 491, 526 

Buros, F. C., 188 

Buros, О. K., 188 

Butsch, R. L. C., 74, 115 


Carver, H. C., 228 
Causation and correlation, 15, 241- 
242 
Centiles, 127 
Central tendency, measures of, 78 ff. 
(see also Average) 
Chi square sampling distribution, 
472 ff. 
assumptions underlying, 476-477 
and correction for continuity, 482— 
483, 491 
and critical ratio, 479, 482 
curves, 475 
degrees of freedom, 477, 479, 480, 
483, 485, 487 
development, experimental, 473-474 
equation, 475n 
and F distribution, 522 
table of, 561 
use of table, 476 
uses 
combining results of tests of sig- 
nificance, 490—491 
in frequency data 
analyzing contingency data, 
419-482 
comparing observed distribu- 
tions, 482 
testing goodness of fit, 483-485 
testing hypothesis, theoretical 
frequencies known, 477-479 
in measurement data 
drawing inferences from a 
variance, 485-486 
testing homogeneity of 
variances, 486-489 
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Class interval (see Frequency dis- 
tribution, terminology) 

Coded scores, 96, 147, 183, 253 

Colton, R. R., 74 

Comparable scores (see Scores, com- 
parable) 


Confidence interval, 389-390 (see 


also Confidence limits) 
Confidence limits, 389, 470 
correlation coefficient, 426 
difference between means, 435- 
436, 465, 467 
mean, 419-420, 431-432, 462 
predicted score, 284-286 
proportion, 408-409, 422 
true score, 340, 350 
variance, 486 
other statistics, 429 
Contingency correlation, 261 ff. 
and attributes, 264-265 
coefficient of 
computation, 262-263 
relation to rz,, 264 
significance, 264, 479-482 
meaning, 261-262, 263—264 
uses and limitations, 264-265 
Contingency table, 261-262, 480-481 
Continuous series, 20 
Correlation, 229 ff. 
and causation, 15, 241—242 
importance, 230-232 
meaning, 229-230, 241—243 
(see also Biserial, Contingency, 
Fourfold, Intraclass, Multiple, 
Partial, Rank Difference cor- 
relation and Correlation, prod- 
uct-moment coefficient of) 
Correlation, net, 297, 302 
Correlation, product-moment coeffi- 
cient of, 233 ff. 
accuracy, 239-240 
approximations, 245, 255, 258, 264 
assumptions in, 242-243 
attenuation, correction for, 353 
computation from 
grouped data, 251-255 
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ungrouped data, 238-239 
two or more coeflicients, 426-427 
factors affecting 
errors of measurement, 352-353 
variability, 294-296 
meaning and interpretation, 289— 
296 
relation to regression coefficients, 
271-272 
sampling distribution, 422-425 
significance, 469 
standard error, 422 
2, transformation 
in estimation, 426 
in testing hypothesis, 424-426 
Correlation, product-moment coeffi- 
cients of 
combining, 426-427 
significance of difference between, 
443—444, 467-408 
Correlation ratio, 320 ІТ. 
computation, 322-325 
interpretation, 325-326 
and linearity of regression, 326, 


502-503 
Correlation table, 251 
Covariance, 516 (see also Analysis of 
covariance) 
Cowden, D. J., 115 
Criterion variable, 282, 312 
Critical ratio 
accuracy, 417n 
and errors of measurement, 450 
meaning, 406, 417, 434 
relation to F, 522 
relation to chi square, 479, 482 
and / test, 462, 165-166 
Critical region (see Rejection, region 
of) 
Croxton, Е. E., 115 
Cumulative frequency curve, 60 
Cumulative percentage curve (see 
Percentage curve) 
Curvilinear relationship (see Corre- 
lation ratio) 
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Decile deviation D, 125 
accuracy, 174-175 
computation, 127 
reliability, empirical demonstra- 
tion, 171-173 
standard error, 428 
Deciles, 125 
Degrees of freedom, 459-461 (see 
also Chi square, F, and / 
sampling distributions) 
Dependent variable, 271, 280, 304, 
311 
Determination, coefficient of, 292 
Differences, four explanations, 450— 
451 
Differences between statistics, sig- 
nificance of (see statistics in 
question) 
Difficulty, test item, 359-362 
Discrete series, 20 
Discrimination, test item, 362-366 
Dispersion (see Variability) 
Distribution (see Frequency dis- 
tribution, Sampling distribu- 
tion) 
Dixon, W. J., 188, 525 
Duncan, A. J., 35, 206, 228, 327, 526 
Durost, Walter, 74 


Educational Testing Service, 167, 364 
Eells, Kenneth, 371 
Efficient statistic, 388 
Equivalence, coefficient of, 335 
Error, probable, 421 
Error, standard (see Standard error) 
Errors 
constant, 329, 351, 352 
distribution of, 68, 206-207 
of estimate (see Standard error of 
estimate) 
of grouping, 64 
of measurement or observation, 
329, 338-340, 372 
correlated, 354 
effect on correlation coefficient, 
352-353 


Index 


effect on mean, 96, 352, 449 
effect on standard deviation, 352 
effect on statistical inference, 
448—450 
homoscedasticity of, 350, 490 
and interpretation of test scores, 
349 ff. 
and reliability of evidence, 338 ff. 
(see also Standard error of meas- 
urement) 
sampling, 65, 328, 372 ff. 
Types I and П in testing hy- 
potheses, 383 
Estimation of parameters 
interval, 389-390 
point, 388-389 
and sample size, 390-391, 470-471 
(see also Confidence interval, Con- 
fidence limits) 
Evidence, conditions of trustworthy, 
328 IT. 
Evidence, in problem solving, 1 


F ratio, 493 
F sampling distribution, 493 ff. 
curves, 494, 495 
degrees of freedom, 493-494, 499, 
508—509, 520 
development, experimental, 494- 
495 
relation to chi square, CR, and &, 
522 
table of, 562-565 
use of table, 496 
uses (see Analysis of covariance; 
Analysis of variance; Vari- 
ances, homogeneity, tests 
for) 
F test (see F sampling distribution, 
uses) 
Fact, In 
Fisher, R. A., 4, 5, 6, 34, 185, 231, 
316, 327, 423, 434, 455, 463, 
491, 514, 525, 526, 559, 
560 
Flanagan, John C., 363, 364, 371 
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Fourfold correlation, 255 ff. 
point coefficient 
computation, 255-257 
formula, derivation of, 532 
significance, 481 
in test item analysis, 366-367 
tetrachoric coefficient 
in approximating г.у, 258, 259 
assumptions in, 258 
computation, 258-259 
significance, 481 
Fourfold table, 256, 257 
Frequency curve, cumulative, 60 
Frequency distribution, 39-45 
bivariate, 251 
characteristics, 76 
construction, 43-45 
of continuous series, 40 
descriptive constants, 185 
of discrete series, 39 
errors in, 64-67 
graphical representation, 57-62 
smoothing, 65 
terminology 
class interval, 42 
class midpoint, 43 
indicated class limits, 42 
real class limits, 43 
types 
bimodal, 81 
leptokurtic, 71-72 
multimodal, 81 
normal, 68—69, 189 ІТ, 
platykurtic, 71-72 
skewed, 70-71 
other, 73 
Frequency polygon, 58-60 
area and frequency relationships 
in, 59 
construction, 61 
as a line graph, 59 
smoothed, 60 


Galton, Francis, 4, 19, 68, 69, 74, 274, 
327 
Geary, R. C., 188 
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Geometric mean, 102 ff. 
accuracy, 113 
computation, 105 
standard error, 427 
uses 
as an average of ratios, 103—104, 
11 E 
in the positively skewed dis- 
tribution, 104—107, 111 
Goodness of fit, test for, 483-485 
Gosset, W. S., 455, 526 
Graphic presentation, generally, 46-56 
Graphs 
common types, 46—53 
construction, 53-55 ‘ 
purposes, 46 
Grouping data 
assumptions in, 64, 86, 92, 94, 131, 
139, 272, 322 
errors of, 64 
purposes, 36 
Guilford, J. P., 223, 228, 371 


Harmonic mean, 108 ff. 
accuracy, 113 
and arithmetic mean, 108 
uses, 108-109, 111 
Harris, J. A., 161, 188 
Hartley's test of homogeneity, 487n 
Histogram, 57-58 
area and frequency relationships 
in, 58 
construction, 61 
Homogeneity 
assumption of, 465-466, 486-487, 
496, 501, 522 
and bimodal distribution, 84 
meaning, 10 
of variances 
importance, 486-487, 496 
test for, 486-489, 496-497 
Homoscedasticity 
assumption of, 283, 287, 303, 318, 
325, 350, 357, 487 
meaning, 283 
test for, 489-490 
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Horn, Daniel, 245, 327 
Hotelling, Harold, 467, 526 
Hypothesis, statistical or null, 380 fr. 
acceptance of, 381, 383 
and type II error, 385-388, 405, 
451 
nature of, 380-381 
necessity, 380 
rejection of, 381 
and type I error, 383, 385, 451 
testing 
controlling risk of error, 385-398 
effect of sample size, 388 
effect of statistic used, 388 
level of significance, 381 
mistakes or errors in, 382-385 
power of a test, 385n 
region of acceptance, 384, 405, 
416, 459, 485 
region of rejection, 384, 405, 
416-417, 425, 434—435, 459, 
485, 496 
subjective features in, 381-382, 
385 
(see also Binomial, Chi square, F, 
Normal and { sampling dis- 
tributions, uses) 


Independence, statistical, 320 
Independence of contingency data, 
test for, 479-482 
Independent variable, 280, 304 
Inertia, in problem solving, 1 
Inference, statistical, 372 ff. 
bases of, 375, 378 
effects of errors of measurement, 
448-451 
mistakes or errors in, 382-383 
nature of, 8, 373, 374 
the two general problems, 375 
(see also Estimation; Hypothesis, 
testing) 
Internal consistency, coefficient of, 
335 
Interpercentile range, 125, 127, 129 
Interquartile range, 121 


Index 


Interval, class, 42 

Interval, confidence (see Confidence 
interval, Estimation) 

Intraclass correlation coeflicient, 514 

Intuition, in problem solving, 1 

Item, statistical, 21 

Item analysis (see Test item analysis) 


Johnson, P. O., 4, 34 


Kelley, T. L., 203, 228, 327, 431, 526 
Kendell, M. G., 228, 470, 526 
Kenney, J. F., 115 
Knowledge, sources of, 1 
Kuder, б. F., 367, 371 
Kuder-Richardson estimates of test 
reliability, 367 
Kuebler, Maxwell, 410, 526 
Kurtosis, 71-72, 76, 177 ff. 
importance, 185-187 
measures of (see Alpha measures, 
Percentile measures) 


Law, normal, 68, 189 
Law of regression, 274 
Law of single variable, 524 
Least squares, method of, 269-270 
` Leptokurtosis, 71-72 
Level of significance, 381, 451 
Lewis, Don, 491, 526 
Likelihood, maximum, 388 
Lindquist, E. F., 223, 228, 371 
Line of regression (see Regression, 
linear) 
Linearity of regression 
assumption of, 242, 283, 287, 302, 
318, 326, 503, 504, 522 
test for, 503—504 
Logarithms 
in normalizing positively skewed 
data, 107 
table of common, 580-581 
Long, J. A., 371 


Margenau, Henry, 228 
Massey, F. J., 188, 525 
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McCall, W. A., 219n, 228 

McCally, Sarah, 327 

McNemar, Quinn, 14, 35, 440, 526 

Mean (see Arithmetic mean, Geo- 
metric mean, Harmonic mean) 

Mean deviation (see Average devia- 
tion) 

Measurement, standard error of (see 
Standard error of measure- 
ment) 

Measurements 

accuracy, 27 
approximate nature, 24-25 
errors of (see Errors of measure- 
ment) 
precision, 25 
significant digits in, 26-27 
Median, 86 ff. 
accuracy, 113 
computation from 
grouped data, 86-89 
ungrouped data, 86 
relation to mean in skewed dis- 
tribution, 178 
reliability, empirical demonstra- 
tion, 98-100 
standard error, 427 
uses and limitations, 89-90, 110 

Medians, significance of difference 
between, 444 

Merrington, Maxine, 565 

Mesokurtosis, 71-72 

Mid-measure, 86 

Misuses of statistics, 14-16 

Mitchell, W. C., 74 

Mode, 79 ff. 

accuracy, 113 

importance, 81-84 

reliability, empirical demonstra- 
tion, 98-100 

uses and limitations, 84-85, 111 

Moments, statistical 

computation, 181-182, 183-184 

defined, 180-181 

use in measuring skewness and 
kurtosis, 180 ff. 
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Moving average, 65 
Mueller, C. G., 448, 526 
Multiple correlation, coefficient of, 
308 fT. 
bias in, 318 
computation, 308-309 
effect of dropping variables, 505- 
506 
interpretation, 309, 316-318 
sampling distribution, 505 
significance, 314, 504-505 
standard error, 504 
Multiple regression, 303 ff. 
assumptions in, 318-319 
beta coefficients 
four-variable case, 307, 311 
standard errors, 428 
three-variable case, 304, 311 
equations 
four-variable case, 306, 311 
three-variable case, 304, 311 
limitations, 312-313, 315-316 
and sample size, 314, 430 
selection of variables, 314-315 
standard error of estimate, 310-311 
uses 
elimination of known effects, 313 
prediction, 312 
test selection, 313 


National Bureau of Standards, 408 
Net correlation, 297, 302 
Nondetermination, Coefficient of, 292 
Nonlinear correlation (see Correla- 
tion ratio) 
Nonnormality 
effects of, 73, 186, 447-448, 465, 
523 
resulting from selection, 204-205 
Normal curve, 189 ff. 
as approximation to binomial dis- 
tribution, 403 
equations, 192-194 
fitting to given distribution, 199— 
201 
as limiting form, 189-191 
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properties, 194-195 
standard or unit form, 193-194 
areas and frequencies in, 195— 
199 
table of areas, 557 
use of table, 196-199 
table of ordinates, 558 
use of table, 195 
uses 
in educational measurements, 
210-224 
in statistical inference (see Nor- 
mal sampling distribution) 
Normal law, 68, 189 
Normal sampling distribution, 413 ff. 
assumptions in, 413, 434, 446, 451 
limitations, 430 
uses 
inferences from. 
correlation coefficient, 424—426 
mean, 416-420 
proportion, 421-422 
statistics, generally, 427-430 
testing significance of difference 
between 
correlation coefficients, 443- 
444 
means, 433-436, 436-437 
proportions, 437-440, 440—441 
standard deviations, 441-442 
statistics, generally, 444 
Normality 
assumption of, 186, 243, 248, 258, 
284, 287, 318, 350, 413, 446, 
451, 459, 465, 467, 485, 493, 
501, 507, 520, 522 
concept of, 203-210 
conditions of, 204 
as experimental fact, 206 
tests for, 446-447, 483-485 
Normalizing data, 216-219 
Null hypothesis (see Hypothesis, 
statistical) 
Numbers 
approximate, 24-30 
accuracy, 27 
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computing with, 27-30 
precision, 25 
sources, 24 
exact, 23-24 
computing with, 24, 30 


Observations, 21 

Ogive shape, 129 

Open-end distribution, 90 

Order of partial correlation coeffi- 
cient, 302, 303 

Ordered data, percentile rank of, 132- 


134 

Ordinates of normal curve (see Nor- 
mal curve) 

Organization of statistical data, 
36 ff. 


Otis, Arthur S., 345, 371 


Paired data, 409, 436, 466 

Parameter, 373n (see also Estimation 

of parameters) 

Partial correlation, 297 ff. 
assumptions in, 302-303 
coefficient of 

first order, 302 

second order, 303 

significance, 469 

standard error, 428 

z, transformation, 427 
as correlation of residuals, 298-301 
and experimental control, 298, 303 
uses and limitations, 302-303 

Partial regression coefficient, 304n 

Pearson, E. S., 446, 566 

Pearson, Karl, 4, 233 

Peatman, J. G., 569 

Percentage curve, cumulative, 127 ff. 
construction, 127-129 
and form of distribution, 129 
uses 

comparing distributions, 130, 135 
determining percentiles and per- 
centile ranks, 129 
Percentage error of approximate 
number, 27, 30 
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Percentile approximations of stand- 
ard deviation, 175-176 
Percentile curve or chart (see Per- 
centage curve, cumulative) 
Percentile measures of 
kurtosis, 179-180 
skewness, 178-179 
variability, 119-127, 135 
Percentile rank or score, 129, 131 ff. 
determination 
by computation, 131-132 
from normal curve, 164, 211- 
212 
from percentile curve, 129, 131 
limitations, 136 
meaning, 131 
in ordered data, 132-134 
and standard scores, 166, 211-212 
Percentiles, 125 ff. 
determination 
by computation, 125-126 
form normal curve, 211-212 
from percentile curve, 129 
meaning, 125 
uses and limitations, 134-137 
Peters, C. C., 188, 327, 526, 560 
Phi coefficient, 256 
Phenomenon, 19 
Platykurtosis, 71-72 
Playfair, William, 46 
Polygon, frequency (see Frequency 
polygon) 
Population, statistical, 6 ff. 
distribution, 378 
finite, 7, 444 
infinite, 7 
stratified, 11-12, 445-446 
two-fold, 394 
Power of a statistical test, 385n 
Precision of measurement, 25 
Prediction, statistical, 281—288, 312- 
313 
assumptions and conditions of, 287 
limitations, 312-313, 430, 470 
reliability, 283-287, 312, 314 
Predictor variable, 282, 312 
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Presentation of statistical data, 36 ff. 
Probability 
figures 
how expressed, 378 
interpretation of high, 382, 476 
interpretation of low, 382, 476 
limits of, 382 
and inference, 375, 378 
intuitive interpretation, 382n 
meaning, 375-376 
rules 
addition, 376 
multiplication, 377 
Probable error, 421 
Proportion, 21-22 
as arithmetic mean, 112 
and confidence limits of popula- 
tion proportion, 408-409, 422 
sample size needed for specified 
reliability, 432-433 
standard error 
population finite, 445 
population stratified, 446 
Proportions 
combining, 112, 439 
significance of difference be- 
tween, 437-441 


Qualitative series, 21 
and attributes, 21 
central tendency, 111 
combining, 215-216 
transforming, 212-215 
Quantitative series, 20, 23 
characteristics, 76 
continuous, 20 
discrete, 20 
normalizing, 216-219 
Quartile deviation, 119 ff. 
accuracy, 174-175 
computation, 121-122 
relation to average and standard 
deviations, 170 
reliability, empirical demonstra- 
tion, 171-173 
standard error, 428 
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uses and limitations, 122-124, 169- 
170, 173-174 
Quartile deviations, significance of 
difference between, 444 
Quartiles, 119 
computation, 119-120 
significance of difference between, 
444 
standard error, 427 
Questionnaires, reliability of, 260, 
3 6 
Quetelet, A., 4, 68, 69 


Random errors (see Errors of meas- 
urement) 
Random numbers, table of, 568- 
569 
use of table, 9-10, 98-99 
Random sample (see Sample, 
random) 
Range, 118 
interpercentile, 125 
semi-interquartile, 121 
uses and limitations, 118, 173. 
Rank difference correlation, 243 ff. 
coefficient of 
computation, 244-245 
significance, 469 
uses, 245 
Ranking data, 134, 245 
Ratings, transformation of, 132, 
212-216 
Ratio, critical (see Critical ratio) 
Reciprocals, table of, 579 
Reduction of data (see Statistical 
methods, necessity of) 
Regression, linear, 267 ЇЇ. 
coefficients of 
computation, 270-272 
relation to correlation coeffi- 
cient, 271-272 
standard error, 428 
significance, 468 
linearity, test of, 503-504 
lines, equations of, 269-272, 274, 
279-280 


t 


Index 


tendency ог “law,” 274-276 
uses, 279-280 
(see also Multiple regression; 
Prediction, statistical) 
Regression coefficients, significance 
of difference between, 444, 469 
Rejection, region of, 384, 405, 416- 
417, 425, 434-435, 459, 485, 
196 
Relative frequency, 22, 130, 198, 
376 
Relative standard deviation (see 
Variation, coefficient of) 
Reliability, test item, 365 
Reliability of evidence, 332 ff. 
Reliability of statistics 
averages, empirical demonstration, 
98-100 
and confidence interval, 390-391 
meaning, 171, 390 
measures of variability, empirical 
demonstration, 171-173 
and sample size, 232, 388, 431- 
433, 470—471 
and significance, 390-391 
Reliability of test scores 
coefficient of, 334—336 
effect of range of talent, 351- 
352 
as ratio of true score and 
obtained score variances, 343, 
514 
and standard error of measure- 
ment, 341-343 
interpretation 
data needed in, 356-357 
estimated vs real reliability, 351 
in light of standard error of 
measurement, 349—351, 353 
in light of use of scores, 351-354 
methods of estimating 
advantages and limitations, 
354-356 
assumptions underlying, 348 
half-tests, 334-335 
intraclass correlation, 513—515 
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Kuder-Richardson, 367-368 
parallel forms, 334-335 
test-retest, 334-335 
Reporting reliability data, 356-357 
Reporting results of approximate 
computation, 30, 431 
Representative sample, selection 
of, 8 
Research and statistics, 2 {f., 374— 
315 
Residuals, 298, 313, 518-519 
Richardson, M. W., 367, 371 
Rider, P. R., 469, 523, 526 
Root-mean-square, 156 
Rounding numbers, rules for, 27- 
28 
Rulon, P. J., 345, 371 


s, as estimate of population stand- 
ard deviation, 143, 455, 463 
87, as estimate of population 
variance, 493 
Sample 
distribution, 378 
nonrandom, 13 
generalizing from, 13-14 
limitations, 13, 15 
random, 9 
advantages, 11 
methods of selecting, 9-12 
as representative sample, 9, 373 
simple, 373 
stratified, 11-12, 445-446 
size, 232, 373, 388, 431—433, 470- 
471 
Sampling distribution 
experimental, 379 
in inference, 378, 380, 391 
meaning, 378-380 
(see also Binomial, Chi square, F, 
Normal, and / sampling 
distributions) 
Sampling error, 65, 328, 372 ff. 
Sandiford, P., 371 
Scale of scores, 21 
Scaling test items, 222-223, 362 
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Scatter (see Variability) 
Schafer, R., 569 
Score 
observed or obtained, 338 
percentile (see Percentile rank) 
standard, 162 (see also Standard 
scores) 
stanine, 211 
T, 219-220. 
true, 338 
2, 162 
Z, 167, 219-220 
Scores, 21 
comparable, 127, 166 
scale of, 21 
significance of difference between, 
450 
Seidman, Frances, 516, 526 
Semi-interquartile range, 121 
Series (see Statistical series) 
Sheppard's correction for standard 
deviation, 149-150 
Shuster, C. N., 35 
Sigma scores, 162n 
Sign test, 409-412 
Significance, statistical, 390, 434 
and reliability, 390-391 
and utility, 391 
Significance, tests of, 382, 390, 434. 
Significant digits, 26 
and percentage error, 27 
Skewness, 70-71, 76, 177 ff. 
importance, 185-187 
measures of (see Alpha measures, 
Percentile measures) 
Smith, J. G., 35, 206, 228, 327, 526 
Smoothing the frequency 
distribution, 65-67 
Snedecor, G. W., 161, 188, 526 
Spearman-Brown formula, 334 
Squares and square roots 
tables of, 570-578, 579 
Stability, coefficient of, 335 
Standard deviation, 142 ff. 
accuracy, 174-175 
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approximating from percentiles, 
175-176 
computation from 
combined groups, 150-152 
correlation table, 254 
grouped data, 145-149 
ungrouped data, 142, 144-145 
and effects of 
constant errors, 352 
errors of measurement, 352, 354 
grouping errors, 149 
formulas, derivation of, 529-530 
as minimum root-mean-square, 
156, 530 
population estimate, 143, 173, 416, 
455, 463 
relation to average and quartile 
deviations, 170 
reliability, empirical demonstra- 
tion, 171-173 , 
of sampling distribution, 415, 420 
Sheppard's correction for, 149— 
150 
standard error, 428 
uses and limitations, 152-156, 
169-170, 173-174 
Standard deviations 
combining, 150-152 
significance of difference between, 
441-442 
Standard error, 420-421, 427-428 
and probable error, 421 
and reliability of a statistic, 431 
and sample size, 431 
as standard deviation of sampling 
distribution, 415, 420 
Standard error of a difference 
between 
correlated sample 
means, 436-437 
proportions, 440 
standard deviations, 442 
other statistics, 444 
independent sample 
correlation coefficients, 443 
means, 434 
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proportions, 439 
standard deviations, 442 
other statistics, 444 
Standard error of a difference 
^ between obtained scores, 450 
‘Standard error of estimate, 276 I. 
formula, derivation of, 533 
and interpretation of rzy, 290-294 
in multiple regression, 310-311 
and reliability of prediction, 283- 
287, 312 
in simple regression, 276-219 
Standard error of measurement, 
341 ff. 
methods of estimating, 343—348, 
515 
and reliability coefficient, 341—343 
and reliability of a mean, 449-150 
use in interpreting an observed 
score, 349—351, 515 
Standard scores, 162 ff. 
accuracy, 167 
interpretation and use, 163-166, 187 
mean, 163 
and percentile ranks, 161, 166, 
211-212 
in product-moment correlation, 
237-238 
standard deviation, 163 
transformations, 166-167 
mean, 167, 530-531 
standard deviation, 167, 530-531 
Stanine score, 211 
Statistic, 5 
Statistical data, 19 ff. 
"Statistical inference (see Inference, 
statistical) 
Statistical items, 21 
Statistical methods 
broad uses, 3 
criticism, 3 
how to study, 32-34 
necessity 
in inference, 6, 372 ff. 
in reduction of data, 5, 127, 185 
origin, 4 
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Statistical series, 20 
continuous, 20 
discrete, 20 
qualitative, 21 
quantitative, 20 
Statistics 
descriptive, 8 
misuses, 14-16 
and research, 2 f., 374-375 
sampling, 8 
three meanings of, 4 
Stephan, F. F., 14, 35 
Stratified sample (see Sample, 
stratified) 
* Student," 455, 526 
Sum of products, 515 
Sum of squares, 143, 464, 493, 499, 
510, 521 
Symonds, P. M., 371 


1 ratio, 455 
and critical ratio, 462, 465—166 
1 sampling distribution, 454 ff. 
curves, 458 
degrees of freedom, 459-461, 463, 
168, 469 
development, experimental, 455— 
456 
equation, 456—457 
limitations, 465-466, 470, 497 
relation to Ё, 522 
table of, 560 
use of table, 458-459 
uses 
inferences from mean, 461—463 
i test of difference between 
correlated coefficients of corre- 
lation, 467-468 
correlated sample means, 466— 
467 
independent sample means, 
463—466 
t test of significance of 
partial correlation coefficient, 
469 
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point biserial correlation coeffi- 
cient, 469 
product-moment correlation 
coefficient, 469 
rank difference correlation 
coefficient, 469 
regression coefficient, 468—469 
T score, 219 
t test (see [sampling distribution, 
uses) 
Tables, statistical, 36-38 
construction, 37 
general purpose, 37 
special purpose, 37 
Tabulation of data 
in correlation table, 251-253 
in frequency distribution, 
44 
Tate, М. W., 35, 115, 371 
Test item analysis, 359 ff. 
chart, 359 
and test improvement, 369 
test item 
difficulty, 221-224, 359-362 
discrimination, 362-366 
intercorrelation, 366-367 
validity, 368-369 
variance, 361 
Tests of significance, 382, 390, 434 
Tetrachoric correlation (see Fourfold 
correlation) 
"Thompson, Catherine, 561, 565 
Thorndike, R. L., 327 
Transformations of 
nonnormal data, 216—219, 448 
qualitative data, 212-216 
standard scores (see Standard 
scores, transformations) 
True score, 338 
2x 2-fold table, 256, 257 


Universe (see Population) 

17. S. Bureau of Census, 74 

U. S. Department of Agriculture, 
74 
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Validity 
coefficient, 332 
experimental, 330 
formal, 330 
test item, 365, 368-369 
Van Voorhis, W. R., 188, 327, 526, 
560 
Variable, 19-20 
criterion, 282, 312 
dependent, 271, 280, 304, 311 
independent, 280, 304 
predictor, 282, 312 
qualitative, 21 
quantitative, 20 
Variability 
and coefficient of correlation, 294— 
296 
meaning and importance, 76, 116- 
118, 465-466, 486-487, 496, 513 
measures of, interpretation and use, 
169 ff. 

(see also Average Deviation, In- 
terpercentile measures, Quar- 
tile deviation, Range, Standard 
Deviation) 

and reliability coefficient, 351, 352 
and stratified sample, 12, 445 
Variance, 143, 181, 291 
analysis of (see Analysis of vari- 
ance) 
confidence limits for, 486 
explained and unexplained, 291- 
292, 309, 316-318 
unbiased estimate, 493 
Variance error, 420 
Variance error of estimate, 291 
Variance ratio (see F ratio) 
Variances 
homogeneity, tests for, 486—489, 
496-497 
significance of difference between, 
496-497 
Variate, 19-20 
Variation, coefficient of, 158 Ж, 
standard error, 428 
uses, 174 


M а-дан йл E 


Index № 


comparing variabilities when 
means are unequal, 158 

comparing variabilities when 
units are unlike, 160 

judging exceptional variation, 
160-161 


Walker, Helen, 35, 74, 115, 374, 
526 

Weighted mean, 97 

Welch, В. L., 526 . 
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Yates, F., 560 

Yates’ correction for continuity, 482- 
483, 491 

Yule, С. U., 228, 470, 526 


z score, Z score (see Score) 
z, transformation of rzy, 423-424 
standard error, 428 
table of, 559 
zero-order correlation coefficient, 302, 
303 
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